colossalai.utils.gradient_accumulation

colossalai.utils.gradient_accumulation.accumulate_gradient(model, optimizer, dataloader, accumulate_size, gradient_handlers=None, lr_scheduler=None)

Parameters

model (torch.nn.Module) – your model object
optimizer (torch.optim.Optimizer) – your optimizer object
dataloader (Iterable) – your dataloader object
accumulate_size (int) – the number of steps to accumulate gradients
gradient_handlers (List[colossalai.engine.BaseGradientHandler]) – list of gradient handler objects. Default is None
lr_scheduler (torch.optim.lr_scheduler._LRScheduler) – your lr scheduler object. Default is None

class colossalai.utils.gradient_accumulation.GradAccumDataloader(dataloader, accumulate_size)

A wrapper for dataloder to enable gradient accumulation by dropping the last incomplete steps.

For example, if a dataloader has 10 batches of data and accumulate size is 4. The model paramters will be update only twice at step 4 and step 8. The last two batches of data do not form a complete 4-step cycle. Thus, they will be automatically skipped by this class. If the dataloader is not standard PyTorch dataloader, (e.g. Dali dataloader), this class will automatically consume (load data for nothing) the remaining 2 batches.

Parameters

dataloader (Iterable) – Your dataloader object
accumulate_size (int) – The number of steps to accumulate gradients

class colossalai.utils.gradient_accumulation.GradAccumOptimizer(optim, accumulate_size, model=None)

A wrapper for the optimizer to enable gradient accumulation by skipping the steps before accumulation size is reached

Parameters

optim (torch.optim.Optimizer) – Your optimizer object
accumulate_size (int) – The number of steps to accumulate gradients
model (torch.nn.Module) – Your model object to check if it is DDP for special handling of no_sync() context

class colossalai.utils.gradient_accumulation.GradAccumLrSchedulerByStep(lr_scheduler, accumulate_size)

A wrapper for the LR scheduler to enable gradient accumulation by skipping the steps before accumulation size is reached

Parameters

lr_scheduler (torch.optim.lr_scheduler._LRScheduler) – Your lr scheduler object
accumulate_size (int) – The number of steps to accumulate gradients

class colossalai.utils.gradient_accumulation.GradAccumGradientHandler(grad_handler, accumulate_size)

A wrapper for the gradient handler to enable gradient accumulation by skipping the steps before accumulation size is reached

Parameters

grad_handler (colossalai.engine.BaseGradientHandler) – Your gradient handler object
accumulate_size (int) – The number of steps to accumulate gradients