colossalai.utils.moe
- colossalai.utils.moe.get_moe_epsize_param_dict(model)
Returns a parameter dictionary, the key of which is the expert parallel size of every parameter. Since the parameters in data parallelism is replicated in each GPU, we set their ep_size to 1.
- Parameters
model (torch.nn.Module) – A pyTorch nn.model from which we get dict
- colossalai.utils.moe.sync_moe_model_param(model)
Make sure model parameters are consistent in MoE parallel context
- Parameters
model (torch.nn.Module) – A pyTorch nn.model on whose parameters you check the consistency