colossalai.nn.layer.parallel_sequence

class colossalai.nn.layer.parallel_sequence.TransformerSelfAttentionRing(hidden_size, num_attention_heads, attention_dropout, attention_mask_func, layer_number, apply_query_key_layer_scaling=False, convert_fp16_to_fp32_in_softmax=False, attn_mask_type=AttnMaskType.padding, masked_softmax_fusion=True, fp16=False, bf16=False)

Parallel self-attention layer abstract class. Self-attention layer takes input with size [b, s, h] and returns output of the same size.

Parameters

hidden_size (int) – hidden size
kv_channels (int) – channels of key/value tensor
num_attention_heads (int) – number of attention heads
attention_dropout (float) – dropout probability for attention layer

class colossalai.nn.layer.parallel_sequence.RingAV(*args, **kwargs): Calculate AV in a ring-exchange style

class colossalai.nn.layer.parallel_sequence.RingQK(*args, **kwargs): Calculate QK in a ring-exchange style

colossalai.nn.layer.parallel_sequence.layers