colossalai.utils

colossalai.utils.checkpoint(function, activation_offload, *args)

Checkpoint the computation while preserve the rng states, modified from Pytorch torch.utils.checkpoint

Parameters

function – Describe the forward pass function. It should know how to handle the input tuples.
args – Tuple containing the parameters of the function

Returns

Output of running function with provided args

colossalai.utils.print_rank_0(msg, logger=None)

Print messages and save logs(optional). This is executed only if you are the rank-0 gpu.

Parameters

msg (str) – A string message to output
logger (optional) – Python logger object, defaults to None

colossalai.utils.sync_model_param(model, parallel_mode)

Make sure data parameters are consistent during Data Parallel Mode

Parameters

model (torch.nn.Module) – A pyTorch nn.model on whose parameters you check the consistency
parallel_mode (colossalai.context.ParallelMode) – Parallel mode to be checked

colossalai.utils.clip_grad_norm_fp32(parameters, max_norm, norm_type=2)

Clips gradient norm of an iterable of parameters whose gradients are in fp32.

This is adapted from torch.nn.utils.clip_grad.clip_grad_norm_() and added functionality to handle model parallel parameters. Note that the gradients are modified in place.

Parameters

parameters ((Iterable[Tensor] or Tensor)) – An iterable of Tensors or a single Tensor that will have gradients normalized
max_norm (float or int) – Max norm of the gradients
norm_type (float or int) – Type of the used p-norm. Can be 'inf' for infinity norm.

Returns

Total norm of the parameters (viewed as a single vector).

Return type

float

colossalai.utils.get_current_device(): Returns the index of a currently selected device (gpu/cpu).

colossalai.utils.synchronize(): Similar to cuda.synchronize(). Waits for all kernels in all streams on a CUDA device to complete.

colossalai.utils.empty_cache(): Similar to cuda.empty_cache() Releases all unoccupied cached memory currently held by the caching allocator.

colossalai.utils.set_to_cuda(models)

Send model to gpu.

Parameters: models – nn.module or a list of module

colossalai.utils.report_memory_usage(message, logger=None, report_cpu=False)

Calculate and print RAM usage (in GB)

Parameters

message (str) – A prefix message to add in the log
logger (colossalai.logging.DistributedLogger, optional) – An instance of colossalai.logging.DistributedLogger
report_cpu (bool, optional) – Whether to report CPU memory

Raises

EnvironmentError – Raise error if no distributed environment has been initialized

class colossalai.utils.Timer

A timer object which helps to log the execution times, and provides different tools to assess the times.

start(): Fisrtly synchronize cuda, reset the clock and then start the timer.

lap(): lap time and return elapsed time

stop(keep_in_history=False)

Stop the timer and record the start-stop time interval.

Parameters: keep_in_history (bool, optional) – Whether does it record into history each start-stop interval, defaults to False
Returns: Start-stop interval
Return type: int

get_history_mean()

Mean of all history start-stop time intervals.

Returns: Mean of time intervals
Return type: int

get_history_sum()

Add up all the start-stop time intervals.

Returns: Sum of time intervals
Return type: int

get_elapsed_time()

Return the last start-stop time interval.

Note

Use it only when timer is not in progress

Returns: The last time interval
Return type: int

reset(): Clear up the timer and its history

class colossalai.utils.MultiTimer(on=True)

An object contains multiple timers

Parameters: on (bool, optional) – Whether the timer is enabled. Default is True

start(name)

Start namely one of the timers

Parameters: name (str) – Timer’s key

stop(name, keep_in_history)

Stop namely one of the timers.

Parameters

name (str) – Timer’s key
keep_in_history (bool) – Whether does it record into history each start-stop interval

get_timer(name)

Get timer by its name (from multitimer)

Parameters: name – Timer’s key
Returns: Timer with the name you give correctly
Return type: Timer

reset(name=None)

Reset timers.

Parameters: name (optional) – If name is designated, the named timer will be reset and others will not, defaults to None

colossalai.utils.accumulate_gradient(model, optimizer, dataloader, accumulate_size, gradient_handlers=None, lr_scheduler=None)

Parameters

model (torch.nn.Module) – your model object
optimizer (torch.optim.Optimizer) – your optimizer object
dataloader (Iterable) – your dataloader object
accumulate_size (int) – the number of steps to accumulate gradients
gradient_handlers (List[colossalai.engine.BaseGradientHandler]) – list of gradient handler objects. Default is None
lr_scheduler (torch.optim.lr_scheduler._LRScheduler) – your lr scheduler object. Default is None

class colossalai.utils.DataParallelSampler(dataset, shuffle=False, seed=0, drop_last=False)

A data sampler for distributed data parallelism

Parameters

dataset (torch.utils.data.Dataset) – A Dataset instance
shuffle (bool, optional) – Whether to shuffle data, defaults to False
seed (int, optional) – The random seed, defaults to 0
drop_last (bool, optional) – Set to True to drop the last incomplete batch, if the dataset size is not divisible by the batch size. If False and the size of dataset is not divisible by the batch size, then the last batch will be smaller, defaults to False

set_epoch(epoch)

Sets the epoch for this sampler. When shuffle=True, this ensures all replicas use a different random ordering for each epoch. Otherwise, the next iteration of this sampler will yield the same ordering.

Parameters: epoch (int) – Epoch number.

colossalai.utils.get_dataloader(dataset, shuffle=False, seed=1024, add_sampler=True, drop_last=False, pin_memory=False, num_workers=0, **kwargs)

Set up a deterministic dataloader (also configure seed workers, samplers and whether shuffle or not)

Note

When pipeline parallel is enabled, shuffle cannot be True as it will result in mismatch between input data on the 1st stage and label on the last stage

Parameters

dataset (torch.utils.data.Dataset) – A torch.utils.data.Dataset object
shuffle (bool, optional. Default is False) – Whether to shuffle the dataset
seed (int, optional. Default is 1024) – Random worker seed, defaults to 1024
add_sampler (bool, optional. Default is True) – Add DistributedDataParallelSampelr to the dataset
drop_last (bool, optional. Default is False) – Drop the last incomplete batch of data
pin_memory (bool, optional. Default is False) – Whether to pin memory address in CPU memory
num_workers (int, optional. Default is 0) – Number of worker threads for this dataloader

Returns

A object of torch.utils.data.DataLoader

Return type

torch.utils.data.DataLoader