colossalai.utils.moe

colossalai.utils.moe.get_moe_epsize_param_dict(model)

Returns a parameter dictionary, the key of which is the expert parallel size of every parameter. Since the parameters in data parallelism is replicated in each GPU, we set their ep_size to 1.

Parameters: model (torch.nn.Module) – A pyTorch nn.model from which we get dict

colossalai.utils.moe.sync_moe_model_param(model)

Make sure model parameters are consistent in MoE parallel context

Parameters: model (torch.nn.Module) – A pyTorch nn.model on whose parameters you check the consistency