colossalai.engine.schedule

class colossalai.engine.schedule.BaseSchedule(batch_data_process_func=None)

A basic helper class to control the process of training or evaluation. It mainly composes of forward_backward_step for gradient backward and optimizer_step for parameters update. For the convenience to enable FP16, we aggreate all codes that contain the control of FP16 in class schedule.

load_batch(data_iter, to_gpu=True)

Loads a batch from data iterator. It returns the data and labels which are already in the same GPU as where the model’s.

Parameters

data_iter (DataIter) – Data iterator from which get a batch of data
to_gpu (bool, optional) – Whether the data should be moved to GPU

Returns

(data, label)

Return type

(Tensor, torch.Tensor)

pre_processing(engine): To perform actions before running the schedule.

abstract forward_backward_step(engine, data_iter, forward_only, return_loss=True, return_output_label=True)

The process function over a batch of dataset for training or evaluation.

Parameters

engine (colossalai.engine.Engine) – Colossalai training engine
data_iter (DataIter) – Data iterator from which get a batch of data
forward_only (bool) – If True, the process won’t include backward
return_loss (bool, optional) – If False, the loss won’t be returned
return_output_label (bool, optional) – If False, the output and label won’t be returned

class colossalai.engine.schedule.NonPipelineSchedule(batch_data_process_func=None)

A helper schedule class for no pipeline parallelism running environment. During one process, it loads a batch of dataset and feeds it to the model. After getting the output and calculating the loss, it will use step() to update the parameters if it is in training mode.

forward_backward_step(engine, data_iter, forward_only=False, return_loss=True, return_output_label=True)

The process function that loads loads a batch of dataset and feeds it to the model. The returned labels and loss will None if return_loss is False.

Parameters

engine (Iterator) – Model for training and inference
data_iter (Iterator) – Data iterator of the dataloader, e.g. iter(dataloader)
forward_only (bool, optional) – If True, the model is run for the forward pass, else back propagation will be executed
return_loss (bool, optional) – Loss will be returned if True
return_output_label (bool, optional) – Output and label will be returned if True

Returns

(output, label, loss)

Return type

Tuple[torch.Tensor]

class colossalai.engine.schedule.PipelineSchedule(num_microbatches, batch_data_process_func=None, tensor_shape=None, scatter_gather_tensors=False)

A helper schedule class for pipeline parallelism running environment. It uses non-interleaved 1F1B strategy. Other properties are similar as NonPipelineSchedule.

Parameters

num_microbatches (int) – The number of microbatches
batch_data_process_func (Callable, optional) – The preprocessing function which receives a batch of data, and it will be executed in load_batch
tensor_shape (torch.Size, optional) – Specified shape in pipeline communication
scatter_gather_tensors (bool, optional) – If set to True, communication will be reduced over pipeline when using 1D tensor parallelization

forward_step(engine, input_tensor, return_tensors, return_output_label=True, accum_loss=None)

Forward step for passed-in model. If it is the first stage, the input tensor is obtained from data_iterator, otherwise the passed-in input_tensor is used. Returns output tensor. This is a helper function and can be ignored by users.

Parameters

engine (colossalai.engine.Engine) – Your engine object
input_tensor (torch.Tensor) – Input tensor for this pipeline stage
return_tensors (List[torch.Tensor]) – A list of tensors to return
return_output_label (bool, optional) – Whether returns output labels
accum_loss (optional) – Where accumulated loss stores

Returns

output or the loss value of the current pipeline stage

Return type

torch.Tensor

backward_step(engine, input_tensor, output_tensor, output_tensor_grad)

Backward step through the passed-in output tensor. If it is the last stage, the output_tensor_grad is None, otherwise it is the gradients with respect to stage’s output tensor. Returns the gradients with respect to the input tensor (None if first stage). This is a helper function and can be ignored by users.

Parameters

engine (colossalai.engine.Engine) – your engine object
input_tensor (torch.Tensor) – input tensor for this pipeline stage
output_tensor (torch.Tensor) – output tensor for this pipeline stage
output_tensor_grad (torch.Tensor) – gradient of output tensor for this pipeline stage

Returns

gradient of input tensor

Return type

torch.Tensor

forward_backward_step(engine, data_iter, forward_only=False, return_loss=True, return_output_label=True)

Runs non-interleaved 1F1B schedule, with communication between pipeline stages. Returns a tuple with losses if the last stage, an empty tuple otherwise.

Parameters

engine (colossalai.engine.Engine) – Your engine object
data_iter (Iterable) – Dataloader as the form of an iterator, obtained by calling iter(dataloader)
forward_only (bool) – Whether run forward step only. Default is false. If true, no backward will be run.
return_loss (bool) – Whether returns the loss value. Default is true.
return_output_label (bool) – If False, the output and label won’t be returned

Returns

(output, label, loss)

Return type

Tuple[torch.Tensor]