colossalai.nn.optimizer.lamb

Adapted from the pytorch-lamb library at https://github.com/cybertronai/pytorch-lamb

class colossalai.nn.optimizer.lamb.Lamb(params, lr=0.001, betas=(0.9, 0.999), eps=1e-06, weight_decay=0, adam=False)

Implements Lamb algorithm. It has been proposed in Large Batch Optimization for Deep Learning: Training BERT in 76 minutes.

Parameters
  • params (iterable) – iterable of parameters to optimize or dicts defining parameter groups

  • lr (float, optional) – learning rate (default: 1e-3)

  • betas (Tuple[float, float], optional) – coefficients used for computing running averages of gradient and its square (default: (0.9, 0.999))

  • eps (float, optional) – term added to the denominator to improve numerical stability (default: 1e-6)

  • weight_decay (float, optional) – weight decay (L2 penalty) (default: 0)

  • adam (bool, optional) – always use trust ratio = 1, which turns this into Adam. Useful for comparison purposes.

step(closure=None)

Performs a single optimization step.

Parameters

closure (callable, optional) – A closure that reevaluates the model and returns the loss.