colossalai.nn.layer.parallel_sequence
- class colossalai.nn.layer.parallel_sequence.TransformerSelfAttentionRing(hidden_size, num_attention_heads, attention_dropout, attention_mask_func, layer_number, apply_query_key_layer_scaling=False, convert_fp16_to_fp32_in_softmax=False, attn_mask_type=AttnMaskType.padding, masked_softmax_fusion=True, fp16=False, bf16=False)[source]
Parallel self-attention layer abstract class. Self-attention layer takes input with size [b, s, h] and returns output of the same size.
- Parameters
hidden_size (int) – hidden size.
num_attention_heads (int) – number of attention heads.
attention_dropout (float) – dropout probability for attention layer.
attention_mask_func (
typing.Callable) – Mask function to be applied.layer_number (int) – number of layers.