colossalai.context.process_group_initializer

class colossalai.context.process_group_initializer.Initializer_Tensor(*args, **kwargs)[source]

A ProcessGroupInitializer for tensor parallelism.

Parameters
  • rank (int) – The rank of current process.

  • world_size (int) – Size of whole communication world.

  • config (Config) – Running configuration.

  • data_parallel_size (int) – Size of data parallel.

  • pipeline_parallel_size (int) – Size of pipeline parallel.

  • tensor_parallel_size (int) – Size of tensor parallel.

init_dist_group()[source]

Initialize tensor parallel groups, and assign local_ranks and groups to each gpu.

Returns

A Tensor parallelism’s information tuple.

Return type

Tuple (local_rank, group_world_size, process_group, ranks_in_group, mode)

class colossalai.context.process_group_initializer.Initializer_Sequence(*args, **kwargs)[source]

A ProcessGroupInitializer for sequence parallelism.

Parameters
  • rank (int) – The rank of current process.

  • world_size (int) – Size of whole communication world.

  • config (Config) – Running configuration.

  • data_parallel_size (int) – Size of data parallel.

  • pipeline_parallel_size (int) – Size of pipeline parallel.

  • tensor_parallel_size (int) – Size of tensor parallel.

init_dist_group()[source]

Initialize Sequence parallel process groups and assign local_ranks and groups to each gpu.

Sequence parallelism requires 2 process groups. The first is for model forward where several processes exchange partial query, key and value embedding to compute self attention values. The second is for all-reduce to synchronize the model parameters.

Returns

A Sequence parallelism’s information in list of tuples.

Return type

List[Tuple (local_rank, group_world_size, process_group, ranks_in_group, mode)]

class colossalai.context.process_group_initializer.Initializer_Pipeline(*args, **kwargs)[source]

A ProcessGroupInitializer for pipeline parallelism.

Parameters
  • rank (int) – The rank of current process

  • world_size (int) – Size of whole communication world

  • config (Config) – Running configuration

  • data_parallel_size (int) – Size of data parallel

  • pipeline_parallel_size (int) – Size of pipeline parallel

  • tensor_parallel_size (int) – Size of tensor parallel

init_dist_group()[source]

Initialize pipeline parallel groups, and assign local_ranks and groups to each gpu.

Returns

A Pipeline parallelism’s information in list of tuples.

Return type

List[Tuple (local_rank, group_world_size, process_group, ranks_in_group, mode)]

class colossalai.context.process_group_initializer.Initializer_Data(*args, **kwargs)[source]

A ProcessGroupInitializer for data parallelism.

Parameters
  • rank (int) – The rank of current process.

  • world_size (int) – Size of whole communication world.

  • config (Config) – Running configuration.

  • data_parallel_size (int) – Size of data parallel.

  • pipeline_parallel_size (int) – Size of pipeline parallel.

  • tensor_parallel_size (int) – Size of tensor parallel.

init_dist_group()[source]

Initialize data parallel groups, and assign local_ranks and groups to each gpu.

Returns

A Data parallelism’s information tuple.

Return type

Tuple (local_rank, group_world_size, process_group, ranks_in_group, mode)

class colossalai.context.process_group_initializer.Initializer_2p5D(rank, world_size, config, data_parallel_size, pipeline_parallel_size, tensor_parallel_size, depth)[source]

Serve as the single entry point to Tesseract parallel initialization.

Parameters
  • rank (int) – The rank of current process.

  • world_size (int) – Size of whole communication world.

  • config (Config) – Running configuration.

  • data_parallel_size (int) – Size of data parallel.

  • pipeline_parallel_size (int) – Size of pipeline parallel.

  • tensor_parallel_size (int) – Size of tensor parallel.

  • depth (int) – The depth of 2.5d parallel.

init_dist_group()[source]

Initialize 2.5D tensor row, col, depth, and colXdepth parallel groups, and assign local_ranks and groups to each gpu.

Returns

Whole 2.5D tensor parallelism’s information in a list of tuples.

Return type

List[Tuple (local_rank, group_world_size, process_group, ranks_in_group, mode)]

class colossalai.context.process_group_initializer.Initializer_2D(*args, **kwargs)[source]

Serve as the single entry point to 2D parallel initialization.

Parameters
  • rank (int) – The rank of current process.

  • world_size (int) – Size of whole communication world.

  • config (Config) – Running configuration.

  • data_parallel_size (int) – Size of data parallel.

  • pipeline_parallel_size (int) – Size of pipeline parallel.

  • tensor_parallel_size (int) – Size of tensor parallel.

init_dist_group()[source]

Initialize 2D tensor row and col parallel groups, and assign local_ranks and groups to each gpu.

Returns

2D tensor parallelism’s information in a list of tuples.

Return type

List[Tuple (local_rank, group_world_size, process_group, ranks_in_group, mode)]

class colossalai.context.process_group_initializer.Initializer_3D(*args)[source]

Serve as the single entry point to 3D parallel initialization.

Parameters
  • rank (int) – The rank of current process.

  • world_size (int) – Size of whole communication world.

  • config (Config) – Running configuration.

  • data_parallel_size (int) – Size of data parallel.

  • pipeline_parallel_size (int) – Size of pipeline parallel.

  • tensor_parallel_size (int) – Size of tensor parallel.

init_dist_group()[source]

Initialize 3D tensor parallel groups, and assign local_ranks and groups to each gpu.

Returns

Whole 3D tensor parallelism’s information in a list of tuples.

Return type

List[Tuple (local_rank, group_world_size, process_group, ranks_in_group, mode)]

class colossalai.context.process_group_initializer.Initializer_1D(*args, **kwargs)[source]

A ProcessGroupInitializer for 1d tensor parallelism.

Parameters
  • rank (int) – The rank of current process.

  • world_size (int) – Size of whole communication world.

  • config (Config) – Running configuration.

  • data_parallel_size (int) – Size of data parallel.

  • pipeline_parallel_size (int) – Size of pipeline parallel.

  • tensor_parallel_size (int) – Size of tensor parallel.

init_dist_group()[source]

Initialize 1D tensor parallel groups, and assign local_ranks and groups to each gpu.

Returns

1D tensor parallelism’s information in a tuple.

Return type

Tuple (local_rank, group_world_size, process_group, ranks_in_group, mode)

class colossalai.context.process_group_initializer.ProcessGroupInitializer(rank, world_size, config, data_parallel_size, pipeline_parallel_size, tensor_parallel_size)[source]

An object, knowing the parallelism configuration, that initializes parallel groups.

Parameters
  • rank (int) – The rank of current process.

  • world_size (int) – Size of whole communication world.

  • config (Config) – Running configuration.

  • data_parallel_size (int) – Size of data parallel.

  • pipeline_parallel_size (int) – Size of pipeline parallel.

  • tensor_parallel_size (int) – Size of tensor parallel.

class colossalai.context.process_group_initializer.Initializer_Model(*args, **kwargs)[source]

A ProcessGroupInitializer for model parallelism (model parallel group contains pipeline and tensor parallel groups).

Parameters
  • rank (int) – The rank of current process.

  • world_size (int) – Size of whole communication world.

  • config (Config) – Running configuration.

  • data_parallel_size (int) – Size of data parallel.

  • pipeline_parallel_size (int) – Size of pipeline parallel.

  • tensor_parallel_size (int) – Size of tensor parallel.

init_dist_group()[source]

Initialize model parallel groups, and assign local_ranks and groups to each gpu.

Returns

A Model parallelism’s information tuple.

Return type

Tuple (local_rank, group_world_size, process_group, ranks_in_group, mode)