colossalai.context.process_group_initializer.initializer_sequence
- class colossalai.context.process_group_initializer.initializer_sequence.Initializer_Sequence_DP(*args, **kwargs)[source]
A ProcessGroupInitializer for sequence parallelism all-reduce.
In Sequence Parallelism, each GPU holds the full copy of model weights, thus, gradient all-reduce occurs across all processes in the same pipeline stage
- Parameters
rank (int) – The rank of current process
world_size (int) – Size of whole communication world
config (Config) – Running configuration
data_parallel_size (int) – Size of data parallel
pipeline_parallel_size (int) – Size of pipeline parallel
tensor_parallel_size (int) – Size of tensor parallel