colossalai.context.process_group_initializer.initializer_moe

class colossalai.context.process_group_initializer.initializer_moe.Initializer_Moemodel(moe_model, moe_data, *args, **kwargs)

Model parallel initialization for MoE system.

Parameters
  • moe_moel – Size of moe model parallel

  • moe_data (int) – Size of moe data parallel

  • args – Args used in base class

  • kwargs – Kwargs used in base class

init_dist_group()

Initialize model parallel groups in moe parallel environment, and assign local_ranks and groups to each gpu.

Returns

MoE model parallelism’s information

Return type

Tuple(local_rank, group_world_size, process_group, ranks_in_group, mode)

class colossalai.context.process_group_initializer.initializer_moe.Initializer_Moedata(moe_model, moe_data, *args, **kwargs)

Data parallel initialization for MoE system.

Parameters
  • moe_moel – Size of moe model parallel

  • moe_data (int) – Size of moe data parallel

  • args – Args used in base class

  • kwargs – Kwargs used in base class

init_dist_group()

Initialize data parallel groups in moe parallel environment, and assign local_ranks and groups to each gpu.

Returns

MoE data parallelism’s information

Return type

Tuple(local_rank, group_world_size, process_group, ranks_in_group, mode)

class colossalai.context.process_group_initializer.initializer_moe.Initializer_Moe(*args, **kwargs)

Serves as the single entry point to MoE parallel initialization.

Parameters
  • args – Args used to initialize ProcessGroupInitializer

  • kwargs – Kwargs used to initialize ProcessGroupInitializer

init_dist_group()

Initializes MoE parallel communication groups.

Returns

MoE parallelism’s information

Return type

list of Tuples (local_rank, group_world_size, process_group, ranks_in_group, mode)