colossalai.context.process_group_initializer.initializer_moe
- class colossalai.context.process_group_initializer.initializer_moe.Initializer_Moemodel(moe_model, moe_data, *args, **kwargs)
Model parallel initialization for MoE system.
- Parameters
moe_moel – Size of moe model parallel
moe_data (int) – Size of moe data parallel
args – Args used in base class
kwargs – Kwargs used in base class
- init_dist_group()
Initialize model parallel groups in moe parallel environment, and assign local_ranks and groups to each gpu.
- Returns
MoE model parallelism’s information
- Return type
Tuple(local_rank, group_world_size, process_group, ranks_in_group, mode)
- class colossalai.context.process_group_initializer.initializer_moe.Initializer_Moedata(moe_model, moe_data, *args, **kwargs)
Data parallel initialization for MoE system.
- Parameters
moe_moel – Size of moe model parallel
moe_data (int) – Size of moe data parallel
args – Args used in base class
kwargs – Kwargs used in base class
- init_dist_group()
Initialize data parallel groups in moe parallel environment, and assign local_ranks and groups to each gpu.
- Returns
MoE data parallelism’s information
- Return type
Tuple(local_rank, group_world_size, process_group, ranks_in_group, mode)
- class colossalai.context.process_group_initializer.initializer_moe.Initializer_Moe(*args, **kwargs)
Serves as the single entry point to MoE parallel initialization.
- Parameters
args – Args used to initialize ProcessGroupInitializer
kwargs – Kwargs used to initialize ProcessGroupInitializer
- init_dist_group()
Initializes MoE parallel communication groups.
- Returns
MoE parallelism’s information
- Return type
list of Tuples (local_rank, group_world_size, process_group, ranks_in_group, mode)