colossalai.engine.schedule

class colossalai.engine.schedule.BaseSchedule(batch_data_process_func=None)

A basic helper class to control the process of training or evaluation. It mainly composes of forward_backward_step for gradient backward and optimizer_step for parameters update. For the convenience to enable FP16, we aggreate all codes that contain the control of FP16 in class schedule.

load_batch(data_iter, to_gpu=True)

Loads a batch from data iterator. It returns the data and labels which are already in the same GPU as where the model’s.

Parameters
  • data_iter (DataIter) – Data iterator from which get a batch of data

  • to_gpu (bool, optional) – Whether the data should be moved to GPU

Returns

(data, label)

Return type

(Tensor, torch.Tensor)

pre_processing(engine)

To perform actions before running the schedule.

abstract forward_backward_step(engine, data_iter, forward_only, return_loss=True, return_output_label=True)

The process function over a batch of dataset for training or evaluation.

Parameters
  • engine (colossalai.engine.Engine) – Colossalai training engine

  • data_iter (DataIter) – Data iterator from which get a batch of data

  • forward_only (bool) – If True, the process won’t include backward

  • return_loss (bool, optional) – If False, the loss won’t be returned

  • return_output_label (bool, optional) – If False, the output and label won’t be returned

class colossalai.engine.schedule.NonPipelineSchedule(batch_data_process_func=None)

A helper schedule class for no pipeline parallelism running environment. During one process, it loads a batch of dataset and feeds it to the model. After getting the output and calculating the loss, it will use step() to update the parameters if it is in training mode.

forward_backward_step(engine, data_iter, forward_only=False, return_loss=True, return_output_label=True)

The process function that loads loads a batch of dataset and feeds it to the model. The returned labels and loss will None if return_loss is False.

Parameters
  • engine (Iterator) – Model for training and inference

  • data_iter (Iterator) – Data iterator of the dataloader, e.g. iter(dataloader)

  • forward_only (bool, optional) – If True, the model is run for the forward pass, else back propagation will be executed

  • return_loss (bool, optional) – Loss will be returned if True

  • return_output_label (bool, optional) – Output and label will be returned if True

Returns

(output, label, loss)

Return type

Tuple[torch.Tensor]

class colossalai.engine.schedule.PipelineSchedule(num_microbatches, batch_data_process_func=None, tensor_shape=None, scatter_gather_tensors=False)

A helper schedule class for pipeline parallelism running environment. It uses non-interleaved 1F1B strategy. Other properties are similar as NonPipelineSchedule.

Parameters
  • num_microbatches (int) – The number of microbatches

  • batch_data_process_func (Callable, optional) – The preprocessing function which receives a batch of data, and it will be executed in load_batch

  • tensor_shape (torch.Size, optional) – Specified shape in pipeline communication

  • scatter_gather_tensors (bool, optional) – If set to True, communication will be reduced over pipeline when using 1D tensor parallelization

forward_step(engine, input_tensor, return_tensors, return_output_label=True, accum_loss=None)

Forward step for passed-in model. If it is the first stage, the input tensor is obtained from data_iterator, otherwise the passed-in input_tensor is used. Returns output tensor. This is a helper function and can be ignored by users.

Parameters
  • engine (colossalai.engine.Engine) – Your engine object

  • input_tensor (torch.Tensor) – Input tensor for this pipeline stage

  • return_tensors (List[torch.Tensor]) – A list of tensors to return

  • return_output_label (bool, optional) – Whether returns output labels

  • accum_loss (optional) – Where accumulated loss stores

Returns

output or the loss value of the current pipeline stage

Return type

torch.Tensor

backward_step(engine, input_tensor, output_tensor, output_tensor_grad)

Backward step through the passed-in output tensor. If it is the last stage, the output_tensor_grad is None, otherwise it is the gradients with respect to stage’s output tensor. Returns the gradients with respect to the input tensor (None if first stage). This is a helper function and can be ignored by users.

Parameters
  • engine (colossalai.engine.Engine) – your engine object

  • input_tensor (torch.Tensor) – input tensor for this pipeline stage

  • output_tensor (torch.Tensor) – output tensor for this pipeline stage

  • output_tensor_grad (torch.Tensor) – gradient of output tensor for this pipeline stage

Returns

gradient of input tensor

Return type

torch.Tensor

forward_backward_step(engine, data_iter, forward_only=False, return_loss=True, return_output_label=True)

Runs non-interleaved 1F1B schedule, with communication between pipeline stages. Returns a tuple with losses if the last stage, an empty tuple otherwise.

Parameters
  • engine (colossalai.engine.Engine) – Your engine object

  • data_iter (Iterable) – Dataloader as the form of an iterator, obtained by calling iter(dataloader)

  • forward_only (bool) – Whether run forward step only. Default is false. If true, no backward will be run.

  • return_loss (bool) – Whether returns the loss value. Default is true.

  • return_output_label (bool) – If False, the output and label won’t be returned

Returns

(output, label, loss)

Return type

Tuple[torch.Tensor]