colossalai.nn.layer.moe.utils
- class colossalai.nn.layer.moe.utils.NormalNoiseGenerator(num_experts)
Generates a random noisy mask for logtis tensor.
All noise is generated from a normal distribution (0, 1 / E^2), where E = the number of experts.
- Parameters
num_experts (int) – The number of experts
- class colossalai.nn.layer.moe.utils.UniformNoiseGenerator(eps=0.01)
Generates a random noisy mask for logtis tensor. copied from mesh tensorflow: Multiply values by a random number between 1-epsilon and 1+epsilon. Makes models more resilient to rounding errors introduced by bfloat16. This seems particularly important for logits.
- Parameters
eps (float) – Epsilon in generator