colossalai.context.process_group_initializer.initializer_sequence

class colossalai.context.process_group_initializer.initializer_sequence.Initializer_Sequence_DP(*args, **kwargs)[source]

A ProcessGroupInitializer for sequence parallelism all-reduce.

In Sequence Parallelism, each GPU holds the full copy of model weights, thus, gradient all-reduce occurs across all processes in the same pipeline stage

Parameters
  • rank (int) – The rank of current process

  • world_size (int) – Size of whole communication world

  • config (Config) – Running configuration

  • data_parallel_size (int) – Size of data parallel

  • pipeline_parallel_size (int) – Size of pipeline parallel

  • tensor_parallel_size (int) – Size of tensor parallel

init_dist_group()[source]

Initialize Sequence Parallel process groups used for gradient all-reduce.

Returns

A tuple (local_rank, group_world_size, process_group, ranks_in_group, mode).

Return type

Tuple