colossalai.utils

colossalai.utils.checkpoint(function, activation_offload, *args)

Checkpoint the computation while preserve the rng states, modified from Pytorch torch.utils.checkpoint

Parameters
  • function – Describe the forward pass function. It should know how to handle the input tuples.

  • args – Tuple containing the parameters of the function

Returns

Output of running function with provided args

colossalai.utils.print_rank_0(msg, logger=None)

Print messages and save logs(optional). This is executed only if you are the rank-0 gpu.

Parameters
  • msg (str) – A string message to output

  • logger (optional) – Python logger object, defaults to None

colossalai.utils.sync_model_param(model, parallel_mode)

Make sure data parameters are consistent during Data Parallel Mode

Parameters
  • model (torch.nn.Module) – A pyTorch nn.model on whose parameters you check the consistency

  • parallel_mode (colossalai.context.ParallelMode) – Parallel mode to be checked

colossalai.utils.clip_grad_norm_fp32(parameters, max_norm, norm_type=2)

Clips gradient norm of an iterable of parameters whose gradients are in fp32.

This is adapted from torch.nn.utils.clip_grad.clip_grad_norm_() and added functionality to handle model parallel parameters. Note that the gradients are modified in place.

Parameters
  • parameters ((Iterable[Tensor] or Tensor)) – An iterable of Tensors or a single Tensor that will have gradients normalized

  • max_norm (float or int) – Max norm of the gradients

  • norm_type (float or int) – Type of the used p-norm. Can be 'inf' for infinity norm.

Returns

Total norm of the parameters (viewed as a single vector).

Return type

float

colossalai.utils.get_current_device()

Returns the index of a currently selected device (gpu/cpu).

colossalai.utils.synchronize()

Similar to cuda.synchronize(). Waits for all kernels in all streams on a CUDA device to complete.

colossalai.utils.empty_cache()

Similar to cuda.empty_cache() Releases all unoccupied cached memory currently held by the caching allocator.

colossalai.utils.set_to_cuda(models)

Send model to gpu.

Parameters

models – nn.module or a list of module

colossalai.utils.report_memory_usage(message, logger=None, report_cpu=False)

Calculate and print RAM usage (in GB)

Parameters
Raises

EnvironmentError – Raise error if no distributed environment has been initialized

class colossalai.utils.Timer

A timer object which helps to log the execution times, and provides different tools to assess the times.

start()

Fisrtly synchronize cuda, reset the clock and then start the timer.

lap()

lap time and return elapsed time

stop(keep_in_history=False)

Stop the timer and record the start-stop time interval.

Parameters

keep_in_history (bool, optional) – Whether does it record into history each start-stop interval, defaults to False

Returns

Start-stop interval

Return type

int

get_history_mean()

Mean of all history start-stop time intervals.

Returns

Mean of time intervals

Return type

int

get_history_sum()

Add up all the start-stop time intervals.

Returns

Sum of time intervals

Return type

int

get_elapsed_time()

Return the last start-stop time interval.

Note

Use it only when timer is not in progress

Returns

The last time interval

Return type

int

reset()

Clear up the timer and its history

class colossalai.utils.MultiTimer(on=True)

An object contains multiple timers

Parameters

on (bool, optional) – Whether the timer is enabled. Default is True

start(name)

Start namely one of the timers

Parameters

name (str) – Timer’s key

stop(name, keep_in_history)

Stop namely one of the timers.

Parameters
  • name (str) – Timer’s key

  • keep_in_history (bool) – Whether does it record into history each start-stop interval

get_timer(name)

Get timer by its name (from multitimer)

Parameters

name – Timer’s key

Returns

Timer with the name you give correctly

Return type

Timer

reset(name=None)

Reset timers.

Parameters

name (optional) – If name is designated, the named timer will be reset and others will not, defaults to None

colossalai.utils.accumulate_gradient(model, optimizer, dataloader, accumulate_size, gradient_handlers=None, lr_scheduler=None)
Parameters
  • model (torch.nn.Module) – your model object

  • optimizer (torch.optim.Optimizer) – your optimizer object

  • dataloader (Iterable) – your dataloader object

  • accumulate_size (int) – the number of steps to accumulate gradients

  • gradient_handlers (List[colossalai.engine.BaseGradientHandler]) – list of gradient handler objects. Default is None

  • lr_scheduler (torch.optim.lr_scheduler._LRScheduler) – your lr scheduler object. Default is None

class colossalai.utils.DataParallelSampler(dataset, shuffle=False, seed=0, drop_last=False)

A data sampler for distributed data parallelism

Parameters
  • dataset (torch.utils.data.Dataset) – A Dataset instance

  • shuffle (bool, optional) – Whether to shuffle data, defaults to False

  • seed (int, optional) – The random seed, defaults to 0

  • drop_last (bool, optional) – Set to True to drop the last incomplete batch, if the dataset size is not divisible by the batch size. If False and the size of dataset is not divisible by the batch size, then the last batch will be smaller, defaults to False

set_epoch(epoch)

Sets the epoch for this sampler. When shuffle=True, this ensures all replicas use a different random ordering for each epoch. Otherwise, the next iteration of this sampler will yield the same ordering.

Parameters

epoch (int) – Epoch number.

colossalai.utils.get_dataloader(dataset, shuffle=False, seed=1024, add_sampler=True, drop_last=False, pin_memory=False, num_workers=0, **kwargs)

Set up a deterministic dataloader (also configure seed workers, samplers and whether shuffle or not)

Note

When pipeline parallel is enabled, shuffle cannot be True as it will result in mismatch between input data on the 1st stage and label on the last stage

Parameters
  • dataset (torch.utils.data.Dataset) – A torch.utils.data.Dataset object

  • shuffle (bool, optional. Default is False) – Whether to shuffle the dataset

  • seed (int, optional. Default is 1024) – Random worker seed, defaults to 1024

  • add_sampler (bool, optional. Default is True) – Add DistributedDataParallelSampelr to the dataset

  • drop_last (bool, optional. Default is False) – Drop the last incomplete batch of data

  • pin_memory (bool, optional. Default is False) – Whether to pin memory address in CPU memory

  • num_workers (int, optional. Default is 0) – Number of worker threads for this dataloader

Returns

A object of torch.utils.data.DataLoader

Return type

torch.utils.data.DataLoader