colossalai.nn.layer.moe.layers

class colossalai.nn.layer.moe.layers.Top1Router(capacity_factor_train=1.25, capacity_factor_eval=2.0, min_capacity=4, select_policy='first', noisy_func=None, drop_tks=True)

Top1 router that returns the dispatch mask [s, e, c] and combine weight [s, e, c] for routing usage. More deailted function can be found in the paper about Switch Transformer of Google.

Parameters
  • capacity_factor_train (float, optional) – Capacity factor in routing of training

  • capacity_factor_eval (float, optional) – Capacity factor in routing of evaluation

  • min_capacity (int, optional) – The minimum number of the capacity of each expert

  • select_policy (str, optional) – The policy about tokens selection

  • noisy_func (Callable, optional) – Noisy function used in logits

  • drop_tks (bool, optional) – Whether drops tokens in evaluation

class colossalai.nn.layer.moe.layers.Top2Router(capacity_factor_train=1.25, capacity_factor_eval=2.0, min_capacity=4, noisy_func=None, drop_tks=True)

Top2 router that returns the dispatch mask [s, e, c] and combine weight [s, e, c] for routing usage. More deailted function can be found in the paper about ViT-MoE.

Parameters
  • capacity_factor_train (float, optional) – Capacity factor in routing of training

  • capacity_factor_eval (float, optional) – Capacity factor in routing of evaluation

  • min_capacity (int, optional) – The minimum number of the capacity of each expert

  • noisy_func (Callable, optional) – Noisy function used in logits

  • drop_tks (bool, optional) – Whether drops tokens in evaluation

class colossalai.nn.layer.moe.layers.MoeLayer(dim_model, num_experts, router, experts)

A MoE layer, that puts its input tensor to its gate and uses the output logits to router all tokens, is mainly used to exchange all tokens for every expert across the moe tensor group by all to all comunication. Then it will get the output of all experts and exchange the output. At last returns the output of the moe system.

Parameters
  • dim_model (int) – Dimension of model

  • num_experts (int) – The number of experts

  • router (nn.Module) – Instance of router used in routing

  • experts (nn.Module) – Instance of experts generated by Expert