colossalai.engine.schedule
- class colossalai.engine.schedule.BaseSchedule(batch_data_process_func=None)
A basic helper class to control the process of training or evaluation. It mainly composes of forward_backward_step for gradient backward and optimizer_step for parameters update. For the convenience to enable FP16, we aggreate all codes that contain the control of FP16 in class schedule.
- load_batch(data_iter, to_gpu=True)
Loads a batch from data iterator. It returns the data and labels which are already in the same GPU as where the model’s.
- Parameters
data_iter (DataIter) – Data iterator from which get a batch of data
to_gpu (bool, optional) – Whether the data should be moved to GPU
- Returns
(data, label)
- Return type
(
Tensor,torch.Tensor)
- pre_processing(engine)
To perform actions before running the schedule.
- abstract forward_backward_step(engine, data_iter, forward_only, return_loss=True, return_output_label=True)
The process function over a batch of dataset for training or evaluation.
- Parameters
engine (colossalai.engine.Engine) – Colossalai training engine
data_iter (DataIter) – Data iterator from which get a batch of data
forward_only (bool) – If True, the process won’t include backward
return_loss (bool, optional) – If False, the loss won’t be returned
return_output_label (bool, optional) – If False, the output and label won’t be returned
- class colossalai.engine.schedule.NonPipelineSchedule(batch_data_process_func=None)
A helper schedule class for no pipeline parallelism running environment. During one process, it loads a batch of dataset and feeds it to the model. After getting the output and calculating the loss, it will use
step()to update the parameters if it is in training mode.- forward_backward_step(engine, data_iter, forward_only=False, return_loss=True, return_output_label=True)
The process function that loads loads a batch of dataset and feeds it to the model. The returned labels and loss will None if
return_lossis False.- Parameters
engine (Iterator) – Model for training and inference
data_iter (Iterator) – Data iterator of the dataloader, e.g. iter(dataloader)
forward_only (bool, optional) – If True, the model is run for the forward pass, else back propagation will be executed
return_loss (bool, optional) – Loss will be returned if True
return_output_label (bool, optional) – Output and label will be returned if True
- Returns
(output, label, loss)
- Return type
Tuple[
torch.Tensor]
- class colossalai.engine.schedule.PipelineSchedule(num_microbatches, batch_data_process_func=None, tensor_shape=None, scatter_gather_tensors=False)
A helper schedule class for pipeline parallelism running environment. It uses non-interleaved 1F1B strategy. Other properties are similar as
NonPipelineSchedule.- Parameters
num_microbatches (int) – The number of microbatches
batch_data_process_func (Callable, optional) – The preprocessing function which receives a batch of data, and it will be executed in load_batch
tensor_shape (torch.Size, optional) – Specified shape in pipeline communication
scatter_gather_tensors (bool, optional) – If set to True, communication will be reduced over pipeline when using 1D tensor parallelization
- forward_step(engine, input_tensor, return_tensors, return_output_label=True, accum_loss=None)
Forward step for passed-in model. If it is the first stage, the input tensor is obtained from data_iterator, otherwise the passed-in input_tensor is used. Returns output tensor. This is a helper function and can be ignored by users.
- Parameters
engine (colossalai.engine.Engine) – Your engine object
input_tensor (
torch.Tensor) – Input tensor for this pipeline stagereturn_tensors (List[
torch.Tensor]) – A list of tensors to returnreturn_output_label (bool, optional) – Whether returns output labels
accum_loss (optional) – Where accumulated loss stores
- Returns
output or the loss value of the current pipeline stage
- Return type
torch.Tensor
- backward_step(engine, input_tensor, output_tensor, output_tensor_grad)
Backward step through the passed-in output tensor. If it is the last stage, the output_tensor_grad is None, otherwise it is the gradients with respect to stage’s output tensor. Returns the gradients with respect to the input tensor (None if first stage). This is a helper function and can be ignored by users.
- Parameters
engine (colossalai.engine.Engine) – your engine object
input_tensor (
torch.Tensor) – input tensor for this pipeline stageoutput_tensor (
torch.Tensor) – output tensor for this pipeline stageoutput_tensor_grad (
torch.Tensor) – gradient of output tensor for this pipeline stage
- Returns
gradient of input tensor
- Return type
torch.Tensor
- forward_backward_step(engine, data_iter, forward_only=False, return_loss=True, return_output_label=True)
Runs non-interleaved 1F1B schedule, with communication between pipeline stages. Returns a tuple with losses if the last stage, an empty tuple otherwise.
- Parameters
engine (colossalai.engine.Engine) – Your engine object
data_iter (Iterable) – Dataloader as the form of an iterator, obtained by calling iter(dataloader)
forward_only (bool) – Whether run forward step only. Default is false. If true, no backward will be run.
return_loss (bool) – Whether returns the loss value. Default is true.
return_output_label (bool) – If False, the output and label won’t be returned
- Returns
(output, label, loss)
- Return type
Tuple[
torch.Tensor]