colossalai.engine.schedule
- class colossalai.engine.schedule.BaseSchedule(batch_data_process_func=None)[source]
A basic helper class to control the process of training or evaluation. It mainly composes of forward_backward_step for gradient backward and optimizer_step for parameters update. For the convenience to enable FP16, we aggregate all codes that contain the control of FP16 in class schedule.
- Parameters
batch_data_process_func (Callable, optional) – The preprocessing function which receives a batch of data,
load_batch. (and it will be executed in) –
- load_batch(data_iter, to_gpu=True)[source]
Loads a batch from data iterator. It returns the data and labels which are already in the same GPU as where the model’s.
- Parameters
data_iter (Iterable) – Data iterator from which get a batch of data, obtained by calling iter(dataloader).
to_gpu (bool, optional) – Whether the data should be moved to GPU
- Returns
A tuple of (data, label).
- Return type
Tuple (
Tensor,torch.Tensor)
- abstract forward_backward_step(engine, data_iter, forward_only, return_loss=True, return_output_label=True)[source]
The process function over a batch of dataset for training or evaluation.
- Parameters
engine (colossalai.engine.Engine) – Colossalai engine for training and inference.
data_iter (Iterable) – Data iterator from which get a batch of data, obtained by calling iter(dataloader).
forward_only (bool) – If True, the process won’t include backward.
return_loss (bool, optional) – If False, the loss won’t be returned.
return_output_label (bool, optional) – If False, the output and label won’t be returned.
- class colossalai.engine.schedule.NonPipelineSchedule(batch_data_process_func=None)[source]
A helper schedule class for no pipeline parallelism running environment. During one process, it loads a batch of dataset and feeds it to the model. After getting the output and calculating the loss, it will use
step()to update the parameters if it is in training mode.- Parameters
batch_data_process_func (Callable, optional) – The preprocessing function which receives a batch of data,
load_batch. (and it will be executed in) –
- forward_backward_step(engine, data_iter, forward_only=False, return_loss=True, return_output_label=True)[source]
The process function that loads a batch of dataset and feeds it to the model. The returned labels and loss will None if
return_lossis False.- Parameters
engine (colossalai.engine.Engine) – Colossalai engine for training and inference.
data_iter (Iterable) – Dataloader as the form of an iterator, obtained by calling iter(dataloader).
forward_only (bool, optional) – If True, the model is run for the forward pass, else back propagation will be executed.
return_loss (bool, optional) – Loss will be returned if True.
return_output_label (bool, optional) – Output and label will be returned if True.
- Returns
A tuple of (output, label, loss), loss and label could be None.
- Return type
Tuple[
torch.Tensor]
- class colossalai.engine.schedule.PipelineSchedule(num_microbatches, batch_data_process_func=None, tensor_shape=None, scatter_gather_tensors=False)[source]
A helper schedule class for pipeline parallelism running environment. It uses non-interleaved 1F1B strategy. Other properties are similar as
NonPipelineSchedule.- Parameters
num_microbatches (int) – The number of microbatches.
batch_data_process_func (Callable, optional) – The preprocessing function which receives a batch of data, and it will be executed in load_batch.
tensor_shape (torch.Size, optional) – Specified shape in pipeline communication.
scatter_gather_tensors (bool, optional) – If set to True, communication will be reduced over pipeline when using 1D tensor parallelization.
- forward_step(engine, input_tensor, return_tensors, return_output_label=True, accum_loss=None)[source]
Forward step for passed-in model. If it is the first stage, the input tensor is obtained from data_iterator, otherwise the passed-in input_tensor is used. Returns output tensor. This is a helper function and can be ignored by users.
- Parameters
engine (colossalai.engine.Engine) – Colossalai engine for training and inference.
input_tensor (
torch.Tensor) – Input tensor for this pipeline stage.return_tensors (List[
torch.Tensor]) – A list of tensors to return.return_output_label (bool, optional) – Whether returns output labels.
accum_loss (optional) – Where accumulated loss stores.
- Returns
output or the loss value of the current pipeline stage.
- Return type
torch.Tensor
- backward_step(engine, input_tensor, output_tensor, output_tensor_grad)[source]
Backward step through the passed-in output tensor. If it is the last stage, the output_tensor_grad is None, otherwise it is the gradients with respect to stage’s output tensor. Returns the gradients with respect to the input tensor (None if first stage). This is a helper function and can be ignored by users.
- Parameters
engine (colossalai.engine.Engine) – Colossalai engine for training and inference.
input_tensor (
torch.Tensor) – input tensor for this pipeline stage.output_tensor (
torch.Tensor) – output tensor for this pipeline stage.output_tensor_grad (
torch.Tensor) – gradient of output tensor for this pipeline stage.
- Returns
gradient of input tensor.
- Return type
torch.Tensor
- forward_backward_step(engine, data_iter, forward_only=False, return_loss=True, return_output_label=True)[source]
Runs non-interleaved 1F1B schedule, with communication between pipeline stages. Returns a tuple with losses if the last stage, an empty tuple otherwise.
- Parameters
engine (colossalai.engine.Engine) – Colossalai engine for training and inference.
data_iter (Iterable) – Dataloader as the form of an iterator, obtained by calling iter(dataloader).
forward_only (bool, optional) – Whether run forward step only. Default is false. If true, no backward will be run.
return_loss (bool, optional) – Whether returns the loss value. Default is true.
return_output_label (bool, optional) – If False, the output and label won’t be returned.
- Returns
A tuple of (output, label, loss), loss and label could be None.
- Return type
Tuple[
torch.Tensor]