colossalai.nn.layer.parallel_1d

class colossalai.nn.layer.parallel_1d.Linear1D(in_features, out_features, bias=True, dtype=None, gather_output=False, skip_bias_add=False, weight_initializer=<function kaiming_uniform_.<locals>.initializer>, bias_initializer=<function xavier_uniform_.<locals>.initializer>)

Linear layer for 1D parallelism

Parameters
  • in_features (int) – size of each input sample

  • out_features (int) – size of each output sample

  • bias (bool, optional) – If set to False, the layer will not learn an additive bias, defaults to True

  • dtype (torch.dtype, optional) – The dtype of parameters, defaults to None

  • skip_bias_add (bool, optional) – If set to True, it will skip bias add for linear layer, which is preserved for kernel fusion, defaults to False

  • weight_initializer (Callable, optional) – The intializer of weight, defaults to kaiming uniform initializer

  • bias_initializer (Callable, optional) – The intializer of bias, defaults to xavier uniform initializer

class colossalai.nn.layer.parallel_1d.Linear1D_Col(in_features, out_features, bias=True, dtype=None, gather_output=False, skip_bias_add=False, weight_initializer=<function kaiming_uniform_.<locals>.initializer>, bias_initializer=<function xavier_uniform_.<locals>.initializer>)

Linear layer with column parallelism.

The linear layer is defined as \(Y = XA + b\). A is parallelized along its second dimension as \(A = [A_1, ..., A_p]\).

Parameters
  • in_features (int) – first dimension of matrix A.

  • output_size (int) – second dimension of matrix A.

  • bias (bool, optional) – If set to False, the layer will not learn an additive bias, defaults to True

  • dtype (torch.dtype, optional) – The dtype of parameters, defaults to None

  • gather_output (bool, optional) – If true, call all-gether on output and make Y avaiable to all GPUs, otherwise, every GPU will have its output which is \(Y_i = XA_i\), defaults to False

  • skip_bias_add (bool, optional) – If set to True, it will skip bias add for linear layer, which is preserved for kernel fusion, defaults to False

  • weight_initializer (Callable, optional) – The intializer of weight, defaults to kaiming uniform initializer

  • bias_initializer (Callable, optional) – The intializer of bias, defaults to xavier uniform initializer

class colossalai.nn.layer.parallel_1d.Linear1D_Row(in_features, out_features, bias=True, dtype=None, parallel_input=True, skip_bias_add=False, weight_initializer=<function kaiming_uniform_.<locals>.initializer>, bias_initializer=<function xavier_uniform_.<locals>.initializer>)

Linear layer with row parallelism

Parameters
  • in_features (int) – size of each input sample

  • out_features (int) – size of each output sample

  • bias (bool, optional) – If set to False, the layer will not learn an additive bias, defaults to True

  • dtype (torch.dtype, optional) – The dtype of parameters, defaults to None

  • parallel_input (bool, optional) – If set to True, it’s assumed that the input is splitted, defaults to False

  • skip_bias_add (bool, optional) – If set to True, it will skip bias add for linear layer, which is preserved for kernel fusion, defaults to False

  • weight_initializer (Callable, optional) – The intializer of weight, defaults to kaiming uniform initializer

  • bias_initializer (Callable, optional) – The intializer of bias, defaults to xavier uniform initializer

class colossalai.nn.layer.parallel_1d.Embedding1D(num_embeddings, embedding_dim, padding_idx=None, dtype=None, weight_initializer=<function normal_.<locals>.initializer>, *args, **kwargs)

Embedding for 1D parallelism

Parameters
  • num_embeddings (int) – number of embeddings

  • embedding_dim (int) – dimension of embedding

  • padding_idx (int, optional) – index of padding, defaults to None

  • dtype (torch.dtype, optional) – The dtype of parameters, defaults to None

  • weight_initializer (Callable, optional) – The intializer of weight, defaults to normal initializer

  • args – Args used in F.embedding

  • kwargs – Kwargs used in F.embedding

class colossalai.nn.layer.parallel_1d.Dropout1D(p=0.5, inplace=False)

Dropout layer of 1D parallelism

Parameters
  • p (float, optional) – dropout rate, defaults to 0.5

  • inplace (bool, optional) – If set to True, will do this operation in-place, defaults tp False

class colossalai.nn.layer.parallel_1d.Classifier1D(in_features, num_classes, weight=None, bias=True, dtype=None, weight_initializer=<function kaiming_uniform_.<locals>.initializer>, bias_initializer=<function xavier_uniform_.<locals>.initializer>)

RowLinear with given weight Classifier of 1D parallelism

Parameters
  • in_features (int) – size of input features

  • num_classes (int) – number of classes in the dataset

  • weight (torch.nn.Parameter, optional) – weight of the classifier, defaults to True

  • bias (bool, optional) – If set to False, the layer will not learn an additive bias, defaults to True

  • dtype (torch.dtype, optional) – The dtype of parameters, defaults to None

  • weight_initializer (Callable, optional) – The intializer of weight, defaults to kaiming uniform initializer

  • bias_initializer (Callable, optional) – The intializer of bias, defaults to xavier uniform initializer

class colossalai.nn.layer.parallel_1d.VocabParallelClassifier1D(in_features, num_classes, weight=None, bias=True, dtype=None, weight_initializer=<function kaiming_uniform_.<locals>.initializer>, bias_initializer=<function xavier_uniform_.<locals>.initializer>)

ColLinear with given weight Classifier of 1D parallelism

Parameters
  • in_features (int) – size of input features

  • num_classes (int) – number of classes in the dataset

  • weight (torch.nn.Parameter, optional) – weight of the classifier, defaults to True

  • bias (bool, optional) – If set to False, the layer will not learn an additive bias, defaults to True

  • dtype (torch.dtype, optional) – The dtype of parameters, defaults to None

  • weight_initializer (Callable, optional) – The intializer of weight, defaults to kaiming uniform initializer

  • bias_initializer (Callable, optional) – The intializer of bias, defaults to xavier uniform initializer

class colossalai.nn.layer.parallel_1d.VocabParallelEmbedding1D(num_embeddings, embedding_dim, padding_idx=None, dtype=None, weight_initializer=<function normal_.<locals>.initializer>, *args, **kwargs)

Embedding parallelized in the vocabulary dimension.

Parameters
  • num_embeddings (int) – number of embeddings

  • embedding_dim (int) – dimension of embedding

  • padding_idx (int, optional) – index of padding, defaults to None

  • dtype (torch.dtype, optional) – The dtype of parameters, defaults to None

  • weight_initializer (Callable, optional) – The intializer of weight, defaults to normal initializer

  • args – Args used in F.embedding

  • kwargs – Kwargs used in F.embedding