colossalai.nn.layer.colossalai_layer

class colossalai.nn.layer.colossalai_layer.Linear(in_features, out_features, bias=True, dtype=None, weight_initializer=<function kaiming_uniform_.<locals>.initializer>, bias_initializer=<function xavier_uniform_.<locals>.initializer>, **kwargs)[source]

Linear layer of colossalai.

Parameters
  • in_features (int) – size of each input sample.

  • out_features (int) – size of each output sample.

  • bias (bool, optional) – If set to False, the layer will not learn an additive bias, defaults to True.

  • dtype (torch.dtype, optional) – The dtype of parameters, defaults to None.

  • weight_initializer (typing.Callable, optional) – The initializer of weight, defaults to kaiming uniform initializer.

  • bias_initializer (typing.Callable, optional) – The initializer of bias, defaults to xavier uniform initializer.

Note: kwargs would contain different parameters when you use different parallelisms.

The kwargs should contain parameters below:

Linear1D:
    gather_output: bool (optional, default to be false)
    skip_bias_add: bool (optional, default to be false)
Linear2D:
    skip_bias_add: bool (optional, default to be false)
Linear2p5D:
    skip_bias_add: bool (optional, default to be false)
Linear3D:
    None

More details about initializer please refer to init.

class colossalai.nn.layer.colossalai_layer.Classifier(in_features, num_classes, weight=None, bias=True, dtype=None, weight_initializer=<function kaiming_uniform_.<locals>.initializer>, bias_initializer=<function xavier_uniform_.<locals>.initializer>, vocab_parallel_limit=2048)[source]

Classifier layer of colossalai.

Parameters
  • in_features (int) – size of each input sample.

  • num_classes (int) – number of classes.

  • weight (torch.nn.Parameter, optional) – weight of the classifier, defaults to None.

  • bias (bool, optional) – If set to False, the layer will not learn an additive bias, defaults to True.

  • dtype (torch.dtype, optional) – The dtype of parameters, defaults to None.

  • weight_initializer (typing.Callable, optional) – The initializer of weight, defaults to kaiming uniform initializer.

  • bias_initializer (typing.Callable, optional) – The initializer of bias, defaults to xavier uniform initializer.

More details about initializer please refer to init.

class colossalai.nn.layer.colossalai_layer.Embedding(num_embeddings, embedding_dim, padding_idx=None, dtype=None, weight_initializer=<function normal_.<locals>.initializer>, vocab_parallel_limit=2048, *args, **kwargs)[source]

Embedding for colossalai.

Parameters
  • num_embeddings (int) – number of embeddings.

  • embedding_dim (int) – dimension of embedding.

  • padding_idx (int, optional) – If specified, the entries at padding_idx do not contribute to the gradient; therefore, the embedding vector at padding_idx is not updated during training, i.e. it remains as a fixed “pad”, defaults to None.

  • dtype (torch.dtype, optional) – The dtype of parameters, defaults to None.

  • weight_initializer (typing.Callable, optional) – he initializer of weight, defaults to normal initializer.

The args and kwargs used in torch.nn.functional.embedding should contain:

max_norm (float, optional): If given, each embedding vector with norm larger than max_norm is
            renormalized to have norm max_norm. Note: this will modify weight in-place.
norm_type (float, optional): The p of the p-norm to compute for the max_norm option. Default 2.
scale_grad_by_freq (bool, optional): If given, this will scale gradients by the inverse
            of frequency of the words in the mini-batch. Default False.
sparse (bool, optional): If True, gradient w.r.t. weight will be a sparse tensor. Default False.

More details about args and kwargs could be found in Embedding.

More details about initializer please refer to init

class colossalai.nn.layer.colossalai_layer.PatchEmbedding(img_size, patch_size, in_chans, embed_size, dtype=None, flatten=True, weight_initializer=<function kaiming_uniform_.<locals>.initializer>, bias_initializer=<function xavier_uniform_.<locals>.initializer>, position_embed_initializer=<function zeros_.<locals>.initializer>)[source]

2D Image to Patch Embedding.

Parameters
  • img_size (int) – image size.

  • patch_size (int) – patch size.

  • in_chans (int) – number of channels of input image.

  • embed_size (int) – size of embedding.

  • dtype (torch.dtype, optional) – The dtype of parameters, defaults to None.

  • flatten (bool, optional) – whether to flatten output tensor, defaults to True.

  • weight_initializer (typing.Callable, optional) – The initializer of weight, defaults to kaiming uniform initializer.

  • bias_initializer (typing.Callable, optional) – The initializer of bias, defaults to xavier uniform initializer.

  • position_embed_initializer (typing.Callable, optional) – The initializer of position embedding, defaults to zeros initializer.

More details about initializer please refer to init.

class colossalai.nn.layer.colossalai_layer.LayerNorm(normalized_shape, eps=1e-05, dtype=None)[source]

Layer Normalization for colossalai.

Parameters
  • normalized_shape (int) – input shape from an expected input of size. \([* \times \text{normalized_shape}[0] \times \text{normalized_shape}[1] \times \ldots \times \text{normalized_shape}[-1]]\) If a single integer is used, it is treated as a singleton list, and this module will normalize over the last dimension which is expected to be of that specific size.

  • eps (float, optional) – a value added to the denominator for numerical stability, defaults to 1e-05

  • dtype (torch.dtype, optional) – The dtype of parameters, defaults to None.

class colossalai.nn.layer.colossalai_layer.Dropout(p=0.5, inplace=False)[source]

Dropout layer of colossalai.

Parameters
  • p (float, optional) – probability of an element to be zeroed, defaults 0.5.

  • inplace (bool, optional) – whether to do dropout in-place, default to be False.