colossalai.context.process_group_initializer.initializer_sequence
- class colossalai.context.process_group_initializer.initializer_sequence.Initializer_Sequence_DP(*args, **kwargs)
A ProcessGroupInitializer for sequence parallelism all-reduce.
In Sequence Parallelism, each GPU holds the full copy of model weights, thus, gradient all-reduce occurs across all processes in the same pipeline stage
- Parameters
args – Args used to initialize ProcessGroupInitializer
kwargs – Kwargs used to initialize ProcessGroupInitializer
- init_dist_group()
Initialize Sequence Parallel process groups used for gradient all-reduce.
- Returns
(local_rank, group_world_size, process_group, ranks_in_group, mode)
- Return type
Tuple
- class colossalai.context.process_group_initializer.initializer_sequence.Initializer_Sequence(*args, **kwargs)
A ProcessGroupInitializer for sequence parallelism.
- Parameters
args – Args used to initialize ProcessGroupInitializer
kwargs – Kwargs used to initialize ProcessGroupInitializer
- init_dist_group()
Initialize Sequence parallel process groups and assign local_ranks and groups to each gpu.
Sequence parallelism requires 2 process groups. The first is for model forward where several processes exchange paritial query, key and value embedding to compute self attention values. The second is for all-reduce to synchronize the model parameters.
- Returns
Sequence parallelism’s information
- Return type
list of Tuples (local_rank, group_world_size, process_group, ranks_in_group, mode)