colossalai.utils.gradient_accumulation
- colossalai.utils.gradient_accumulation.accumulate_gradient(model, optimizer, dataloader, accumulate_size, gradient_handlers=None, lr_scheduler=None)
- Parameters
model (
torch.nn.Module) – your model objectoptimizer (
torch.optim.Optimizer) – your optimizer objectdataloader (Iterable) – your dataloader object
accumulate_size (int) – the number of steps to accumulate gradients
gradient_handlers (List[
colossalai.engine.BaseGradientHandler]) – list of gradient handler objects. Default is Nonelr_scheduler (torch.optim.lr_scheduler._LRScheduler) – your lr scheduler object. Default is None
- class colossalai.utils.gradient_accumulation.GradAccumDataloader(dataloader, accumulate_size)
A wrapper for dataloder to enable gradient accumulation by dropping the last incomplete steps.
For example, if a dataloader has 10 batches of data and accumulate size is 4. The model paramters will be update only twice at step 4 and step 8. The last two batches of data do not form a complete 4-step cycle. Thus, they will be automatically skipped by this class. If the dataloader is not standard PyTorch dataloader, (e.g. Dali dataloader), this class will automatically consume (load data for nothing) the remaining 2 batches.
- Parameters
dataloader (Iterable) – Your dataloader object
accumulate_size (int) – The number of steps to accumulate gradients
- class colossalai.utils.gradient_accumulation.GradAccumOptimizer(optim, accumulate_size, model=None)
A wrapper for the optimizer to enable gradient accumulation by skipping the steps before accumulation size is reached
- Parameters
optim (
torch.optim.Optimizer) – Your optimizer objectaccumulate_size (int) – The number of steps to accumulate gradients
model (
torch.nn.Module) – Your model object to check if it is DDP for special handling of no_sync() context
- class colossalai.utils.gradient_accumulation.GradAccumLrSchedulerByStep(lr_scheduler, accumulate_size)
A wrapper for the LR scheduler to enable gradient accumulation by skipping the steps before accumulation size is reached
- Parameters
lr_scheduler (
torch.optim.lr_scheduler._LRScheduler) – Your lr scheduler objectaccumulate_size (int) – The number of steps to accumulate gradients
- class colossalai.utils.gradient_accumulation.GradAccumGradientHandler(grad_handler, accumulate_size)
A wrapper for the gradient handler to enable gradient accumulation by skipping the steps before accumulation size is reached
- Parameters
grad_handler (
colossalai.engine.BaseGradientHandler) – Your gradient handler objectaccumulate_size (int) – The number of steps to accumulate gradients