colossalai.nn.layer.parallel_3d
- colossalai.nn.layer.parallel_3d.reduce_by_batch_3d(tensor, input_parallel_mode, weight_parallel_mode, reduce_mean=False)
All-reduce the input from the model parallel region.
- Parameters
input (torch.tensor) – input maxtrix
input_parallel_mode (colossalai.context.parallel_mode.ParallelMode) – input parallel mode
weight_parallel_mode (colossalai.context.parallel_mode.ParallelMode) – weight parallel mode
reduce_mean – If set to
True, it will divide the output by (input parallel size * weight parallel size),
default to False :type reduce_mean: int, optional
- colossalai.nn.layer.parallel_3d.split_tensor_3d(tensor, dim, parallel_mode)
Splits 3D parallel tensor in specified dimension
- Parameters
tensor (torch.Tensor) – Input tensor
dim (int) – Specified dimension in which to split
parallel_mode (colossalai.context.parallel_mode.ParallelMode) – Parallel mode
weight_parallel_mode – Weight parallel mode
- Return output
Splitted tensor
- Rtype output
torch.Tensor
- colossalai.nn.layer.parallel_3d.split_batch_3d(input_, dim=0, input_parallel_mode=ParallelMode.PARALLEL_3D_INPUT, weight_parallel_mode=ParallelMode.PARALLEL_3D_WEIGHT)
Splits 3D tensor in batch :param input_: Input tensor :param dim: Specified dimension in which to split :param input_parallel_mode: Input parallel mode :param weight_parallel_mode: Weight parallel mode :type input_: torch.Tensor :type dim: int, optional :type input_parallel_mode: colossalai.context.parallel_mode.ParallelMode, optional :type weight_parallel_mode: colossalai.context.parallel_mode.ParallelMode, optional :return output: Splitted tensor :rtype output: torch.Tensor
- class colossalai.nn.layer.parallel_3d.Linear3D(in_features, out_features, bias=True, dtype=None, weight_initializer=<function kaiming_uniform_.<locals>.initializer>, bias_initializer=<function xavier_uniform_.<locals>.initializer>)
Linear layer for 3D parallelism
- Parameters
in_features (int) – size of each input sample
out_features (int) – size of each output sample
bias (bool, optional) – If set to
False, the layer will not learn an additive bias, defaults to Truedtype (torch.dtype, optional) – The dtype of parameters, defaults to None
weight_initializer (Callable, optional) – The intializer of weight, defaults to kaiming uniform initializer
bias_initializer (Callable, optional) – The intializer of bias, defaults to xavier uniform initializer
- class colossalai.nn.layer.parallel_3d.LayerNorm3D(normalized_shape, eps=1e-12, dtype=None)
Layer Normalization for 3D parallelism
- Parameters
normalized_shape – input shape from an expected input of size.
\([* \times \text{normalized_shape}[0] \times \text{normalized_shape}[1] \times \ldots \times \text{normalized_shape}[-1]]\)
If a single integer is used, it is treated as a singleton list, and this module will normalize over the last dimension which is expected to be of that specific size.
- Parameters
eps (float, optional) – a value added to the denominator for numerical stability, defaults to 1e-12
dtype (torch.dtype, optional) – The dtype of parameters, defaults to None
- class colossalai.nn.layer.parallel_3d.PatchEmbedding3D(img_size, patch_size, in_chans, embed_size, flatten=True, dtype=None, weight_initializer=<function kaiming_uniform_.<locals>.initializer>, bias_initializer=<function xavier_uniform_.<locals>.initializer>, position_embed_initializer=<function zeros_.<locals>.initializer>)
2D Image to Patch Embedding
- Parameters
img_size (int) – image size
patch_size (int) – patch size
in_chans (int) – number of channels of input image
embed_size (int) – size of embedding
dtype (torch.dtype, optional) – The dtype of parameters, defaults to None
flatten (bool, optional) – whether to flatten output tensor, defaults to True
weight_initializer (Callable, optional) – The intializer of weight, defaults to kaiming uniform initializer
bias_initializer (Callable, optional) – The intializer of bias, defaults to xavier uniform initializer
position_embed_initializer (Callable, optional) – The intializer of position embedding, defaults to zero
- class colossalai.nn.layer.parallel_3d.Classifier3D(in_features, num_classes, weight=None, bias=True, dtype=None, weight_initializer=<function kaiming_uniform_.<locals>.initializer>, bias_initializer=<function xavier_uniform_.<locals>.initializer>)
Classifier for 3D parallelism
- Parameters
in_features (int) – size of each input sample
num_classes (int) – number of classes
weight (torch.nn.Parameter, optional) – weight of the classifier, defaults to True
bias (bool, optional) – If set to
False, the layer will not learn an additive bias, defaults to Truedtype (torch.dtype, optional) – The dtype of parameters, defaults to None
weight_initializer (Callable, optional) – The intializer of weight, defaults to kaiming uniform initializer
bias_initializer (Callable, optional) – The intializer of bias, defaults to xavier uniform initializer
- class colossalai.nn.layer.parallel_3d.Embedding3D(num_embeddings, embedding_dim, padding_idx=None, dtype=None, weight_initializer=<function normal_.<locals>.initializer>, *args, **kwargs)
Embedding for 3D parallelism
- Parameters
num_embeddings (int) – number of embeddings
embedding_dim (int) – dimension of embedding
padding_idx (int, optional) – index of padding, defaults to None
dtype (torch.dtype, optional) – The dtype of parameters, defaults to None
weight_initializer (Callable, optional) – The intializer of weight, defaults to normal initializer
args – Args used in F.embedding
kwargs – Kwargs used in F.embedding
- class colossalai.nn.layer.parallel_3d.VocabParallelEmbedding3D(num_embeddings, embedding_dim, padding_idx=None, dtype=None, weight_initializer=<function normal_.<locals>.initializer>, *args, **kwargs)
Embedding parallelized in the vocabulary dimension.
- Parameters
num_embeddings (int) – number of embeddings
embedding_dim (int) – dimension of embedding
padding_idx (int, optional) – index of padding, defaults to None
dtype (torch.dtype, optional) – The dtype of parameters, defaults to None
weight_initializer (Callable, optional) – The intializer of weight, defaults to normal initializer
args – Args used in F.embedding
kwargs – Kwargs used in F.embedding
- class colossalai.nn.layer.parallel_3d.VocabParallelClassifier3D(in_features, num_classes, weight=None, bias=True, dtype=None, weight_initializer=<function kaiming_uniform_.<locals>.initializer>, bias_initializer=<function xavier_uniform_.<locals>.initializer>)
Vocab parallel classifier layer for 2D parallelism
- Parameters
in_features (int) – size of each input sample
num_classes (int) – number of classes
weight (torch.nn.Parameter, optional) – weight of the classifier, defaults to True
bias (bool, optional) – If set to
False, the layer will not learn an additive bias, defaults toTruedtype (torch.dtype, optional) – The dtype of parameters, defaults to None
weight_initializer (Callable, optional) – The intializer of weight, defaults to kaiming uniform initializer
bias_initializer (Callable, optional) – The intializer of bias, defaults to xavier uniform initializer