colossalai.utils
- colossalai.utils.checkpoint(function, activation_offload, *args)
Checkpoint the computation while preserve the rng states, modified from Pytorch torch.utils.checkpoint
- Parameters
function – Describe the forward pass function. It should know how to handle the input tuples.
args – Tuple containing the parameters of the function
- Returns
Output of running function with provided args
- colossalai.utils.print_rank_0(msg, logger=None)
Print messages and save logs(optional). This is executed only if you are the rank-0 gpu.
- Parameters
msg (str) – A string message to output
logger (optional) – Python logger object, defaults to None
- colossalai.utils.sync_model_param(model, parallel_mode)
Make sure data parameters are consistent during Data Parallel Mode
- Parameters
model (torch.nn.Module) – A pyTorch nn.model on whose parameters you check the consistency
parallel_mode (colossalai.context.ParallelMode) – Parallel mode to be checked
- colossalai.utils.clip_grad_norm_fp32(parameters, max_norm, norm_type=2)
Clips gradient norm of an iterable of parameters whose gradients are in fp32.
This is adapted from
torch.nn.utils.clip_grad.clip_grad_norm_()and added functionality to handle model parallel parameters. Note that the gradients are modified in place.- Parameters
parameters ((Iterable[Tensor] or Tensor)) – An iterable of Tensors or a single Tensor that will have gradients normalized
max_norm (float or int) – Max norm of the gradients
norm_type (float or int) – Type of the used p-norm. Can be
'inf'for infinity norm.
- Returns
Total norm of the parameters (viewed as a single vector).
- Return type
float
- colossalai.utils.get_current_device()
Returns the index of a currently selected device (gpu/cpu).
- colossalai.utils.synchronize()
Similar to cuda.synchronize(). Waits for all kernels in all streams on a CUDA device to complete.
- colossalai.utils.empty_cache()
Similar to cuda.empty_cache() Releases all unoccupied cached memory currently held by the caching allocator.
- colossalai.utils.set_to_cuda(models)
Send model to gpu.
- Parameters
models – nn.module or a list of module
- colossalai.utils.report_memory_usage(message, logger=None, report_cpu=False)
Calculate and print RAM usage (in GB)
- Parameters
message (str) – A prefix message to add in the log
logger (
colossalai.logging.DistributedLogger, optional) – An instance ofcolossalai.logging.DistributedLoggerreport_cpu (bool, optional) – Whether to report CPU memory
- Raises
EnvironmentError – Raise error if no distributed environment has been initialized
- class colossalai.utils.Timer
A timer object which helps to log the execution times, and provides different tools to assess the times.
- start()
Fisrtly synchronize cuda, reset the clock and then start the timer.
- lap()
lap time and return elapsed time
- stop(keep_in_history=False)
Stop the timer and record the start-stop time interval.
- Parameters
keep_in_history (bool, optional) – Whether does it record into history each start-stop interval, defaults to False
- Returns
Start-stop interval
- Return type
int
- get_history_mean()
Mean of all history start-stop time intervals.
- Returns
Mean of time intervals
- Return type
int
- get_history_sum()
Add up all the start-stop time intervals.
- Returns
Sum of time intervals
- Return type
int
- get_elapsed_time()
Return the last start-stop time interval.
Note
Use it only when timer is not in progress
- Returns
The last time interval
- Return type
int
- reset()
Clear up the timer and its history
- class colossalai.utils.MultiTimer(on=True)
An object contains multiple timers
- Parameters
on (bool, optional) – Whether the timer is enabled. Default is True
- start(name)
Start namely one of the timers
- Parameters
name (str) – Timer’s key
- stop(name, keep_in_history)
Stop namely one of the timers.
- Parameters
name (str) – Timer’s key
keep_in_history (bool) – Whether does it record into history each start-stop interval
- get_timer(name)
Get timer by its name (from multitimer)
- Parameters
name – Timer’s key
- Returns
Timer with the name you give correctly
- Return type
- reset(name=None)
Reset timers.
- Parameters
name (optional) – If name is designated, the named timer will be reset and others will not, defaults to None
- colossalai.utils.accumulate_gradient(model, optimizer, dataloader, accumulate_size, gradient_handlers=None, lr_scheduler=None)
- Parameters
model (
torch.nn.Module) – your model objectoptimizer (
torch.optim.Optimizer) – your optimizer objectdataloader (Iterable) – your dataloader object
accumulate_size (int) – the number of steps to accumulate gradients
gradient_handlers (List[
colossalai.engine.BaseGradientHandler]) – list of gradient handler objects. Default is Nonelr_scheduler (torch.optim.lr_scheduler._LRScheduler) – your lr scheduler object. Default is None
- class colossalai.utils.DataParallelSampler(dataset, shuffle=False, seed=0, drop_last=False)
A data sampler for distributed data parallelism
- Parameters
dataset (torch.utils.data.Dataset) – A Dataset instance
shuffle (bool, optional) – Whether to shuffle data, defaults to False
seed (int, optional) – The random seed, defaults to 0
drop_last (bool, optional) – Set to True to drop the last incomplete batch, if the dataset size is not divisible by the batch size. If False and the size of dataset is not divisible by the batch size, then the last batch will be smaller, defaults to False
- set_epoch(epoch)
Sets the epoch for this sampler. When
shuffle=True, this ensures all replicas use a different random ordering for each epoch. Otherwise, the next iteration of this sampler will yield the same ordering.- Parameters
epoch (int) – Epoch number.
- colossalai.utils.get_dataloader(dataset, shuffle=False, seed=1024, add_sampler=True, drop_last=False, pin_memory=False, num_workers=0, **kwargs)
Set up a deterministic dataloader (also configure seed workers, samplers and whether shuffle or not)
Note
When pipeline parallel is enabled, shuffle cannot be True as it will result in mismatch between input data on the 1st stage and label on the last stage
- Parameters
dataset (
torch.utils.data.Dataset) – Atorch.utils.data.Datasetobjectshuffle (bool, optional. Default is False) – Whether to shuffle the dataset
seed (int, optional. Default is 1024) – Random worker seed, defaults to 1024
add_sampler (bool, optional. Default is True) – Add DistributedDataParallelSampelr to the dataset
drop_last (bool, optional. Default is False) – Drop the last incomplete batch of data
pin_memory (bool, optional. Default is False) – Whether to pin memory address in CPU memory
num_workers (int, optional. Default is 0) – Number of worker threads for this dataloader
- Returns
A object of
torch.utils.data.DataLoader- Return type
torch.utils.data.DataLoader