colossalai.utils.gradient_accumulation
- colossalai.utils.gradient_accumulation.accumulate_gradient(model, optimizer, dataloader, accumulate_size, gradient_handlers=None, lr_scheduler=None)[source]
Turning model, optimizer, dataloader into corresponding object for gradient accumulation.
- Parameters
model (
torch.nn.Module) – your model object for gradient accumulation.optimizer (
torch.optim.Optimizer) – your optimizer object for gradient accumulation.dataloader (
torch.utils.data.DataLoaderor iterable objects) – your dataloader object, would be called like iter(dataloader)accumulate_size (int) – the number of steps to accumulate gradients
gradient_handlers (List[
colossalai.engine.BaseGradientHandler]) – list of gradient handler objects. Default is None.lr_scheduler (torch.optim.lr_scheduler or colossalai.nn.lr_scheduler) – your
lr_schedulerobject for gradient accumulation. Defaults to None.
More details about gradient_handlers could be found in Gradient_handler.
More details about lr_scheduler could be found lr_scheduler. and how to adjust learning rate.
- class colossalai.utils.gradient_accumulation.GradAccumDataloader(dataloader, accumulate_size)[source]
A wrapper for dataloader to enable gradient accumulation by dropping the last incomplete steps.
Note
The dataloader would drop the last incomplete steps for gradient accumulation. For example, if a dataloader has 10 batches of data and accumulate size is 4. The model parameters will be updated only twice at step 4 and step 8. The last two batches of data do not form a complete 4-step cycle. Thus, they will be automatically skipped by this class. If the dataloader is not standard PyTorch dataloader, (e.g. Dali dataloader), this class will automatically consume (load data for nothing) the remaining 2 batches.
- Parameters
optim (
Iterable) – Your dataloader object for gradient accumulation.accumulate_size (int) – The number of steps to accumulate gradients.
- class colossalai.utils.gradient_accumulation.GradAccumOptimizer(optim, accumulate_size, model=None)[source]
A wrapper for the optimizer to enable gradient accumulation by skipping the steps before accumulation size is reached.
- Parameters
optim (
torch.optim.Optimizer) – Your optimizer object for gradient accumulation.accumulate_size (int) – The number of steps to accumulate gradients.
model (
torch.nn.Module) – Your model object to check if it is DistributedDataParallel for special handling of no_sync() context.
- class colossalai.utils.gradient_accumulation.GradAccumLrSchedulerByStep(lr_scheduler, accumulate_size)[source]
A wrapper for the LR scheduler to enable gradient accumulation by skipping the steps before accumulation size is reached.
- Parameters
lr_scheduler (
torch.optim.lr_scheduler._LRScheduler) – Yourlr_schedulerobject for gradient accumulation.accumulate_size (int) – The number of steps to accumulate gradients.
- class colossalai.utils.gradient_accumulation.GradAccumGradientHandler(grad_handler, accumulate_size)[source]
A wrapper for the gradient handler to enable gradient accumulation by skipping the steps before accumulation size is reached.
- Parameters
grad_handler (
colossalai.engine.BaseGradientHandler) – Yourgradient_handlerobject for gradient accumulation, would be called when achieving accumulate_size.accumulate_size (int) – The number of steps to accumulate gradients.
More details about
gradient_handlerscould be found in Gradient_handler.