colossalai.kernel.cuda_native.scaled_softmax
This code from NVIDIA Megatron with some changes.
- class colossalai.kernel.cuda_native.scaled_softmax.AttnMaskType(value)
An enumeration.
- class colossalai.kernel.cuda_native.scaled_softmax.ScaledUpperTriangMaskedSoftmax(*args, **kwargs)
Fused operation which performs following three operations in sequence
Scale the tensor.
Apply upper triangular mask (typically used in gpt models).
Perform softmax.
- class colossalai.kernel.cuda_native.scaled_softmax.ScaledMaskedSoftmax(*args, **kwargs)
Fused operation which performs following three operations in sequence
Scale the tensor.
Apply the mask.
Perform softmax.
- class colossalai.kernel.cuda_native.scaled_softmax.FusedScaleMaskSoftmax(input_in_fp16, input_in_bf16, attn_mask_type, scaled_masked_softmax_fusion, mask_func, softmax_in_fp32, scale)
Fused operation: scaling + mask + softmax
- Parameters
input_in_fp16 – Flag to indicate if input in fp16 data format.
input_in_bf16 – Flag to indicate if input in bf16 data format.
attn_mask_type – Attention mask type (pad or causal)
scaled_masked_softmax_fusion – Flag to indicate user want to use softmax fusion
mask_func – Mask function to be applied.
softmax_in_fp32 – If True, softmax in performed at fp32 precision.
scale – Scaling factor used in input tensor scaling.