colossalai.utils.moe

colossalai.utils.moe.get_moe_epsize_param_dict(model)[source]

Returns a parameter dictionary, the key of which is the expert parallel size of every parameter. Since the parameters in data parallelism is replicated in each GPU, we set their ep_size to 1.

Parameters: model (torch.nn.Module) – A pyTorch nn.Module from which we get dict.

colossalai.utils.moe.sync_moe_model_param(model)[source]

Make sure model parameters are consistent in MoE parallel context.

Parameters: model (torch.nn.Module) – A pyTorch model on whose parameters you check the consistency.