colossalai.utils.gradient_accumulation

colossalai.utils.gradient_accumulation.accumulate_gradient(model, optimizer, dataloader, accumulate_size, gradient_handlers=None, lr_scheduler=None)
Parameters
  • model (torch.nn.Module) – your model object

  • optimizer (torch.optim.Optimizer) – your optimizer object

  • dataloader (Iterable) – your dataloader object

  • accumulate_size (int) – the number of steps to accumulate gradients

  • gradient_handlers (List[colossalai.engine.BaseGradientHandler]) – list of gradient handler objects. Default is None

  • lr_scheduler (torch.optim.lr_scheduler._LRScheduler) – your lr scheduler object. Default is None

class colossalai.utils.gradient_accumulation.GradAccumDataloader(dataloader, accumulate_size)

A wrapper for dataloder to enable gradient accumulation by dropping the last incomplete steps.

For example, if a dataloader has 10 batches of data and accumulate size is 4. The model paramters will be update only twice at step 4 and step 8. The last two batches of data do not form a complete 4-step cycle. Thus, they will be automatically skipped by this class. If the dataloader is not standard PyTorch dataloader, (e.g. Dali dataloader), this class will automatically consume (load data for nothing) the remaining 2 batches.

Parameters
  • dataloader (Iterable) – Your dataloader object

  • accumulate_size (int) – The number of steps to accumulate gradients

class colossalai.utils.gradient_accumulation.GradAccumOptimizer(optim, accumulate_size, model=None)

A wrapper for the optimizer to enable gradient accumulation by skipping the steps before accumulation size is reached

Parameters
  • optim (torch.optim.Optimizer) – Your optimizer object

  • accumulate_size (int) – The number of steps to accumulate gradients

  • model (torch.nn.Module) – Your model object to check if it is DDP for special handling of no_sync() context

class colossalai.utils.gradient_accumulation.GradAccumLrSchedulerByStep(lr_scheduler, accumulate_size)

A wrapper for the LR scheduler to enable gradient accumulation by skipping the steps before accumulation size is reached

Parameters
  • lr_scheduler (torch.optim.lr_scheduler._LRScheduler) – Your lr scheduler object

  • accumulate_size (int) – The number of steps to accumulate gradients

class colossalai.utils.gradient_accumulation.GradAccumGradientHandler(grad_handler, accumulate_size)

A wrapper for the gradient handler to enable gradient accumulation by skipping the steps before accumulation size is reached

Parameters
  • grad_handler (colossalai.engine.BaseGradientHandler) – Your gradient handler object

  • accumulate_size (int) – The number of steps to accumulate gradients