colossalai.nn.layer.moe.experts

class colossalai.nn.layer.moe.experts.MoeExperts(comm_name, num_experts): Basic class for experts in MoE. It stores what kind of communication expersts use to exchange tokens, how many experts in a single GPU and parallel information such as expert parallel size, data parallel size and their distributed communication groups.

class colossalai.nn.layer.moe.experts.Experts(expert_cls, num_experts, **expert_args)

A wrapper class to create experts. It will create E experts across the moe model parallel group, where E is the number of experts. Every expert is a instence of the class, ‘expert’ in initialization parameters.

Parameters

expert – The class of all experts
num_experts (int) – The number of experts
expert_args – Args used to initialize experts

class colossalai.nn.layer.moe.experts.FFNExperts(num_experts, d_model, d_ff, activation=None, drop_rate=0): Use torch.bmm to speed up for multiple experts.

class colossalai.nn.layer.moe.experts.TPExperts(num_experts, d_model, d_ff, activation=None, drop_rate=0): Use tensor parallelism to split each expert evenly, which can deploy experts in case that the number of experts can’t be divied by maximum expert parallel size or maximum expert parallel size can’t be divied by the number of experts.