colossalai.nn.layer.parallel_2p5d.layers

class colossalai.nn.layer.parallel_2p5d.layers.Linear2p5D(in_features, out_features, bias=True, dtype=None, skip_bias_add=False, weight_initializer=<function kaiming_uniform_.<locals>.initializer>, bias_initializer=<function xavier_uniform_.<locals>.initializer>)

Linear layer for 2.5D parallelism

Parameters

in_features (int) – size of each input sample
out_features (int) – size of each output sample
bias (bool, optional) – If set to False, the layer will not learn an additive bias, defaults to True
dtype (torch.dtype, optional) – The dtype of parameters, defaults to None
weight_initializer (Callable, optional) – The intializer of weight, defaults to kaiming uniform initializer
bias_initializer (Callable, optional) – The intializer of bias, defaults to xavier uniform initializer

class colossalai.nn.layer.parallel_2p5d.layers.LayerNorm2p5D(normalized_shape, eps=1e-05, dtype=None)

Layer Normalization for 2.5D parallelism

Parameters

normalized_shape (int) – input shape from an expected input of size. \([* \times \text{normalized_shape}[0] \times \text{normalized_shape}[1] \times \ldots \times \text{normalized_shape}[-1]]\) If a single integer is used, it is treated as a singleton list, and this module will normalize over the last dimension which is expected to be of that specific size.
eps (float, optional) – a value added to the denominator for numerical stability, defaults to 1e-05
dtype (torch.dtype, optional) – The dtype of parameters, defaults to None

class colossalai.nn.layer.parallel_2p5d.layers.PatchEmbedding2p5D(img_size, patch_size, in_chans, embed_size, flatten=True, dtype=None, weight_initializer=<function kaiming_uniform_.<locals>.initializer>, bias_initializer=<function xavier_uniform_.<locals>.initializer>, position_embed_initializer=<function zeros_.<locals>.initializer>)

2D Image to Patch Embedding

Parameters

img_size (int) – image size
patch_size (int) – patch size
in_chans (int) – number of channels of input image
embed_size (int) – size of embedding
dtype (torch.dtype, optional) – The dtype of parameters, defaults to None
flatten (bool, optional) – whether to flatten output tensor, defaults to True
weight_initializer (Callable, optional) – The intializer of weight, defaults to kaiming uniform initializer
bias_initializer (Callable, optional) – The intializer of bias, defaults to xavier uniform initializer
position_embed_initializer (Callable, optional) – The intializer of position embedding, defaults to zero

class colossalai.nn.layer.parallel_2p5d.layers.Embedding2p5D(num_embeddings, embedding_dim, padding_idx=None, dtype=None, weight_initializer=<function normal_.<locals>.initializer>, *args, **kwargs)

Embedding for 2.5D parallelism

Parameters

num_embeddings (int) – number of embeddings
embedding_dim (int) – dimension of embedding
padding_idx (int, optional) – index of padding, defaults to None
dtype (torch.dtype, optional) – The dtype of parameters, defaults to None
weight_initializer (Callable, optional) – The intializer of weight, defaults to normal initializer
args – Args used in F.embedding
kwargs – Kwargs used in F.embedding

class colossalai.nn.layer.parallel_2p5d.layers.VocabParallelEmbedding2p5D(num_embeddings, embedding_dim, padding_idx=None, dtype=None, weight_initializer=<function normal_.<locals>.initializer>, *args, **kwargs)

Embedding parallelized in the vocabulary dimension.

Parameters

num_embeddings (int) – number of embeddings
embedding_dim (int) – dimension of embedding
padding_idx (int, optional) – index of padding, defaults to None
dtype (torch.dtype, optional) – The dtype of parameters, defaults to None
weight_initializer (Callable, optional) – The intializer of weight, defaults to normal initializer
args – Args used in F.embedding
kwargs – Kwargs used in F.embedding

class colossalai.nn.layer.parallel_2p5d.layers.Classifier2p5D(in_features, num_classes, weight=None, bias=True, dtype=None, weight_initializer=<function kaiming_uniform_.<locals>.initializer>, bias_initializer=<function xavier_uniform_.<locals>.initializer>)

Classifier for 2.5D parallelism

Parameters

in_features (int) – size of each input sample
num_classes (int) – number of classes
weight (torch.nn.Parameter, optional) – weight of the classifier, defaults to True
bias (bool, optional) – If set to False, the layer will not learn an additive bias, defaults to True
dtype (torch.dtype, optional) – The dtype of parameters, defaults to None
weight_initializer (Callable, optional) – The intializer of weight, defaults to kaiming uniform initializer
bias_initializer (Callable, optional) – The intializer of bias, defaults to xavier uniform initializer

class colossalai.nn.layer.parallel_2p5d.layers.VocabParallelClassifier2p5D(in_features, num_classes, weight=None, bias=True, dtype=None, weight_initializer=<function kaiming_uniform_.<locals>.initializer>, bias_initializer=<function xavier_uniform_.<locals>.initializer>)

Vocab parallel classifier layer for 2.5D parallelism

Parameters

in_features (int) – size of each input sample
num_classes (int) – number of classes
weight (torch.nn.Parameter, optional) – weight of the classifier, defaults to True
bias (bool, optional) – If set to False, the layer will not learn an additive bias, defaults to True
dtype (torch.dtype, optional) – The dtype of parameters, defaults to None
weight_initializer (Callable, optional) – The intializer of weight, defaults to kaiming uniform initializer
bias_initializer (Callable, optional) – The intializer of bias, defaults to xavier uniform initializer