colossalai.nn.layer.moe.utils

class colossalai.nn.layer.moe.utils.NormalNoiseGenerator(num_experts)

Generates a random noisy mask for logtis tensor.

All noise is generated from a normal distribution (0, 1 / E^2), where E = the number of experts.

Parameters

num_experts (int) – The number of experts

class colossalai.nn.layer.moe.utils.UniformNoiseGenerator(eps=0.01)

Generates a random noisy mask for logtis tensor. copied from mesh tensorflow: Multiply values by a random number between 1-epsilon and 1+epsilon. Makes models more resilient to rounding errors introduced by bfloat16. This seems particularly important for logits.

Parameters

eps (float) – Epsilon in generator