colossalai.nn.optimizer.hybrid_adam

class colossalai.nn.optimizer.hybrid_adam.HybridAdam(model_params, lr=0.001, bias_correction=True, betas=(0.9, 0.999), eps=1e-08, weight_decay=0, adamw_mode=True, simd_log=False)[source]

Implements Adam algorithm.

Supports parameters updating on both GPU and CPU, depanding on the device of paramters. But the parameters and gradients should on the same device:

Parameters on CPU and gradients on CPU is allowed.

Parameters on GPU and gradients on GPU is allowed.

Parameters on GPU and gradients on CPU is not allowed.

Requires ColossalAI to be installed via pip install .

This version of Hybrid Adam is an hybrid of CPUAdam and FusedAdam.

For parameters updating on CPU, it uses CPUAdam.
For parameters updating on GPU, it uses FusedAdam.
Hybird precision calculation of fp16 and fp32 is supported, eg fp32 parameters and fp16 gradients.

colossalai.nn.optimizer.HybridAdam may be used as a drop-in replacement for torch.optim.AdamW, or torch.optim.Adam with adamw_mode=False

Adam was been proposed in `Adam: A Method for Stochastic Optimization`_.

Parameters

model_params (iterable) – iterable of parameters of dicts defining parameter groups.
lr (float, optional) – learning rate. (default: 1e-3)
betas (Tuple[float, float], optional) – coefficients used for computing running averages of gradient and its square. (default: (0.9, 0.999))
eps (float, optional) – term added to the denominator to improve numerical stability. (default: 1e-8)
weight_decay (float, optional) – weight decay (L2 penalty) (default: 0)
amsgrad (boolean, optional) – whether to use the AMSGrad variant of this algorithm from the paper On the Convergence of Adam and Beyond (default: False) NOT SUPPORTED yet in CPUAdam!
adamw_mode (boolean, optional) – Apply L2 regularization or weight decay True for decoupled weight decay(also known as AdamW) (default: True)
simd_log (boolean, optional) – whether to show if you are using SIMD to accelerate. (default: False)