colossalai.trainer.hooks

class colossalai.trainer.hooks.BaseHook(priority)[source]

This class allows users to add desired actions in specific time points during training or evaluation.

Parameters

priority (int) – Priority in the printing, hooks with small priority will be printed in front

after_hook_is_attached(trainer)[source]

Actions after hooks are attached to trainer.

before_train(trainer)[source]

Actions before training.

after_train(trainer)[source]

Actions after training.

before_train_iter(trainer)[source]

Actions before running a training iteration.

after_train_iter(trainer, output, label, loss)[source]

Actions after running a training iteration.

Parameters
  • trainer (Trainer) – Trainer which is using this hook.

  • output (torch.Tensor) – Output of the model.

  • label (torch.Tensor) – Labels of the input data.

  • loss (torch.Tensor) – Loss between the output and input data.

before_train_epoch(trainer)[source]

Actions before starting a training epoch.

after_train_epoch(trainer)[source]

Actions after finishing a training epoch.

before_test(trainer)[source]

Actions before evaluation.

after_test(trainer)[source]

Actions after evaluation.

before_test_epoch(trainer)[source]

Actions before starting a testing epoch.

after_test_epoch(trainer)[source]

Actions after finishing a testing epoch.

before_test_iter(trainer)[source]

Actions before running a testing iteration.

after_test_iter(trainer, output, label, loss)[source]

Actions after running a testing iteration.

Parameters
  • trainer (Trainer) – Trainer which is using this hook

  • output (torch.Tensor) – Output of the model

  • label (torch.Tensor) – Labels of the input data

  • loss (torch.Tensor) – Loss between the output and input data

init_runner_states(trainer, key, val)[source]

Initializes trainer’s state.

Parameters
  • trainer (Trainer) – Trainer which is using this hook

  • key – Key of state to be reset

  • val – Value of state to be reset

class colossalai.trainer.hooks.MetricHook(priority)[source]

Specialized hook classes for Metric. Some help metric collectors initialize, reset and update their states. Others are used to display and record the metric.

Parameters

priority (int) – Priority in the printing, hooks with small priority will be printed in front defaults to 1. If different hooks share same priority, the order of printing would depend on the hooks order in the hook list.

class colossalai.trainer.hooks.LoadCheckpointHook(checkpoint_dir=None, epoch=- 1, finetune=False, strict=False, suffix='', priority=0)[source]

Loads the model before training process.

Parameters
  • checkpoint_dir (str, optional) – Directory of saving checkpoint, defaults to None.

  • epoch (str, optional) – Loading checkpoint of setting epoch numbers, defaults to -1. Epoch equals to -1 means choosing the latest checkpoint.

  • finetune (bool, optional) – Whether allows to load a part of the model, defaults to False.

  • strict (bool, optional) – Whether to strictly enforce that the keys in state_dict of the checkpoint match the names of parameters and buffers in model, defaults to False.

  • suffix (str, optional) – Suffix of checkpoint file path, defaults to ‘’.

  • priority (int, optional) – Priority in the printing, hooks with small priority will be printed in front, defaults to 0. If different hooks share same priority, the order of printing would depend on the hooks order in the hook list.

before_train(trainer)[source]

Loads parameters to the model before training.

class colossalai.trainer.hooks.SaveCheckpointHook(interval=1, checkpoint_dir=None, suffix='', priority=10)[source]

Saves the model by interval in training process.

Parameters
  • interval (int, optional) – Saving interval, defaults to 1.

  • checkpoint_dir (str, optional) – Directory of saving checkpoint, defaults to None.

  • suffix (str, optional) – Saving suffix of the file, defaults to ‘’.

  • priority (int, optional) – Priority in the printing, hooks with small priority will be printed in front defaults to 10. If different hooks share same priority, the order of printing would depend on the hooks order in the hook list.

after_train_epoch(trainer)[source]

Saves the model after a training epoch.

class colossalai.trainer.hooks.LossHook(priority=0)[source]

Specialized hook class for Loss.

Parameters

priority (int, optional) – Priority in the printing, hooks with small priority will be printed in front defaults to 0. If different hooks share same priority, the order of printing would depend on the hooks order in the hook list.

class colossalai.trainer.hooks.AccuracyHook(accuracy_func, priority=0)[source]

Specialized hook class for Accuracy.

Parameters
  • accuracy_func (typing.Callable) – Accuracy function for the classification task.

  • priority (int, optional) – Priority in the printing, hooks with small priority will be printed in front defaults to 0. If different hooks share same priority, the order of printing would depend on the hooks order in the hook list.

class colossalai.trainer.hooks.LogMetricByEpochHook(logger, interval=1, priority=10)[source]

Specialized hook to record the metric to log.

Parameters
  • logger (colossalai.logging.DistributedLogger) – Logger for recording the log information.

  • interval (int, optional) – Interval of printing log information, defaults to 1.

  • priority (int, optional) – Priority in the printing, hooks with small priority will be printed in front, defaults to 10. If different hooks share same priority, the order of printing would depend on the hooks order in the hook list.

class colossalai.trainer.hooks.TensorboardHook(log_dir, ranks=None, parallel_mode=ParallelMode.GLOBAL, priority=10)[source]

Specialized hook to record the metric to Tensorboard.

Parameters
  • log_dir (str) – Directory of log.

  • ranks (list) – Ranks of processors.

  • parallel_mode (colossalai.context.parallel_mode.ParallelMode, optional) – Parallel mode used in trainer, defaults to colossalai.context.parallel_mode.ParallelMode.GLOBAL.

  • priority (int, optional) – Priority in the printing, hooks with small priority will be printed in front, defaults to 10. If different hooks share same priority, the order of printing would depend on the hooks order in the hook list.

class colossalai.trainer.hooks.LogTimingByEpochHook(timer, logger, interval=1, priority=10, log_eval=True, ignore_num_train_steps=0)[source]

Specialized hook to write timing record to log.

Parameters
  • timer (colossalai.utils.MultiTimer) – Timer for the hook.

  • logger (colossalai.logging.DistributedLogger) – Logger for recording the log information.

  • interval (int, optional) – Interval of printing log information, defaults to 1.

  • priority (int, optional) – Priority in the printing, hooks with small priority will be printed in front defaults to 10. If different hooks share same priority, the order of printing would depend on the hooks order in the hook list.

  • log_eval (bool, optional) – Whether writes in evaluation, defaults to True.

  • ignore_num_train_steps (int, optional) – Number of training steps to ignore, defaults to 0.

after_train_epoch(trainer)[source]

Writes log after finishing a training epoch.

after_test_epoch(trainer)[source]

Writes log after finishing a testing epoch.

class colossalai.trainer.hooks.LogMemoryByEpochHook(logger, interval=1, priority=10, log_eval=True, report_cpu=False)[source]

Specialized Hook to write memory usage record to log.

Parameters
  • logger (colossalai.logging.DistributedLogger) – Logger for recording the log information.

  • interval (int, optional) – Interval of printing log information, defaults to 1.

  • priority (int, optional) – Priority in the printing, hooks with small priority will be printed in front defaults to 1. If different hooks share same priority, the order of printing would depend on the hooks order in the hook list.

  • log_eval (bool, optional) – Whether writes in evaluation, defaults to True.

before_train(trainer)[source]

Resets before training.

after_train_epoch(trainer)[source]

Writes log after finishing a training epoch.

after_test(trainer)[source]

Reports after testing.

class colossalai.trainer.hooks.LRSchedulerHook(lr_scheduler, by_epoch, store_lr_in_state=True, priority=1)[source]

Build LR scheduler for trainer.

Parameters
  • lr_scheduler (colossalai.nn.lr_scheduler) – The specific LR scheduler in range of colossalai.nn.lr_scheduler, more details about lr_scheduler could be found in lr_scheduler.

  • by_epoch (bool) – If True, the LR will be scheduled every epoch. Else, the LR will be scheduled every batch.

  • store_lr_in_state (bool, optional) – If True, store the learning rate in each state, defaults to True.

  • priority (int, optional) – Priority in the printing, hooks with small priority will be printed in front defaults to 1. If different hooks share same priority, the order of printing would depend on the hooks order in the hook list.

class colossalai.trainer.hooks.ThroughputHook(ignored_steps=0, priority=10)[source]

Specialized hook class for Throughput. Hook to measure execution throughput (samples/sec).

Parameters
  • ignored_steps (int, optional) – the number of initial training steps to ignore.

  • priority (int, optional) – Priority in the printing, hooks with small priority will be printed in front defaults to 10. If different hooks share same priority, the order of printing would depend on the hooks order in the hook list.

class colossalai.trainer.hooks.LogMetricByStepHook(priority=10)[source]

Hook to log metric by step.

Parameters

priority (int, optional) – Priority in the printing, hooks with small priority will be printed in front, defaults to 10. If different hooks share same priority, the order of printing would depend on the hooks order in the hook list.