colossalai.trainer.hooks

class colossalai.trainer.hooks.BaseHook(priority)

This class allows users to add desired actions in specific time points during training or evaluation.

Parameters

priority (int) – Priority in the printing, hooks with small priority will be printed in front

after_hook_is_attached(trainer)

Actions after hooks are attached to trainer.

before_train(trainer)

Actions before training.

after_train(trainer)

Actions after training.

before_train_iter(trainer)

Actions before running a training iteration.

after_train_iter(trainer, output, label, loss)

Actions after running a training iteration.

Parameters
  • trainer (Trainer) – Trainer which is using this hook

  • output (torch.Tensor) – Output of the model

  • label (torch.Tensor) – Labels of the input data

  • loss (torch.Tensor) – Loss between the output and input data

before_train_epoch(trainer)

Actions before starting a training epoch.

after_train_epoch(trainer)

Actions after finishing a training epoch.

before_test(trainer)

Actions before evaluation.

after_test(trainer)

Actions after evaluation.

before_test_epoch(trainer)

Actions before starting a testing epoch.

after_test_epoch(trainer)

Actions after finishing a testing epoch.

before_test_iter(trainer)

Actions before running a testing iteration.

after_test_iter(trainer, output, label, loss)

Actions after running a testing iteration.

Parameters
  • trainer (Trainer) – Trainer which is using this hook

  • output (Tensor) – Output of the model

  • label (Tensor) – Labels of the input data

  • loss (Tensor) – Loss between the output and input data

init_runner_states(trainer, key, val)

Initializes trainer’s state.

Parameters
  • trainer (Trainer) – Trainer which is using this hook

  • key – Key of reseting state

  • val – Value of reseting state

class colossalai.trainer.hooks.MetricHook(priority)

Specialized hook classes for Metric. Some help metric collectors initialize, reset and update their states. Others are used to display and record the metric.

Parameters

priority (int) – Priority in the printing, hooks with small priority will be printed in front

class colossalai.trainer.hooks.LoadCheckpointHook(checkpoint_dir=None, epoch=- 1, finetune=False, strict=False, suffix='', priority=0)

Loads the model before training process.

Parameters
  • checkpoint_dir (str, optional) – Directory of saving checkpoint, defaults to None

  • epoch (str, optional) – Epoch number to be set, defaults to -1

  • finetune (bool, optional) – Whether allows to load a part of the model, defaults to False

  • strict (bool, optional) – Whether loads a model that has the same shape of parameters, defaults to False

  • suffix (str, optional) – Suffic, defaults to ‘’

  • priority (int, optional) – Priority in the printing, hooks with small priority will be printed in front, defaults to 0

before_train(trainer)

Loads parameters to the model before training.

class colossalai.trainer.hooks.SaveCheckpointHook(interval=1, checkpoint_dir=None, suffix='', priority=10)

Saves the model by interval in training process.

Parameters
  • interval (int, optional) – Saving interval, defaults to 1

  • checkpoint_dir (str, optional) – Directory of saving checkpoint, defaults to None

  • suffix (str, optional) – Saving suffix of the file, defaults to ‘’

  • priority (int, optional) – Priority in the printing, hooks with small priority will be printed in front, defaults to 10

after_train_epoch(trainer)

Saves the model after a training epoch.

class colossalai.trainer.hooks.LossHook(priority=0)

Specialized hook class for Loss.

Parameters

priority (int, optional) – Priority in the printing, hooks with small priority will be printed in front, defaults to 0

class colossalai.trainer.hooks.AccuracyHook(accuracy_func, priority=0)

Specialized hook class for Accuracy.

Parameters
  • accuracy_func (Callable) – Priority in the printing, hooks with small priority will be printed in front

  • priority (int, optional) – Priority in the printing, hooks with small priority will be printed in front, defaults to 0

class colossalai.trainer.hooks.LogMetricByEpochHook(logger, interval=1, priority=10)

Specialized hook to record the metric to log.

Parameters
  • logger – Logger for the log

  • interval (int, optional) – Recording interval, defaults to 1

  • priority (int, optional) – Priority in the printing, hooks with small priority will be printed in front, defaults to 10

class colossalai.trainer.hooks.TensorboardHook(log_dir, ranks=None, parallel_mode=ParallelMode.GLOBAL, priority=10)

Specialized hook to record the metric to Tensorboard.

Parameters
  • log_dir (str) – Directory of log

  • ranks (List) – Ranks of processors

  • parallel_mode (colossalai.context.parallel_mode.ParallelMode, optional) – Parallel mode, defaults to colossalai.context.parallel_mode.ParallelMode.GLOBAL

  • priority (int, optional) – Priority in the printing, hooks with small priority will be printed in front, defaults to 10

class colossalai.trainer.hooks.LogTimingByEpochHook(timer, logger, interval=1, priority=10, log_eval=True, ignore_num_train_steps=0)

Specialized hook to write timing record to log.

Parameters
  • timer (colossalai.utils.MultiTimer) – Timer for the hook

  • logger (colossalai.logging.DistributedLogger) – Logger for the log

  • interval (int, optional) – Recording interval, defaults to 1

  • priority (int, optional) – Priority in the printing, hooks with small priority will be printed in front, defaults to 10

  • log_eval (bool, optional) – Whether writes in evaluation, defaults to True

  • ignore_num_train_steps (int, optional) – Number of training steps to ignore, defaults to 0

after_train_epoch(trainer)

Writes log after finishing a training epoch.

after_test_epoch(trainer)

Writes log after finishing a testing epoch.

class colossalai.trainer.hooks.LogMemoryByEpochHook(logger, interval=1, priority=10, log_eval=True, report_cpu=False)

Specialized Hook to write memory usage record to log.

Parameters
  • logger (colossalai.logging.DistributedLogger) – Logger for the log

  • interval (int, optional) – Recording interval, defaults to 1

  • priority (int, optional) – Priority in the printing, hooks with small priority will be printed in front, defaults to 10

  • log_eval (bool, optional) – Whether writes in evaluation, defaults to True

before_train(trainer)

Resets before training.

after_train_epoch(trainer)

Writes log after finishing a training epoch.

after_test(trainer)

Reports after testing.

class colossalai.trainer.hooks.LRSchedulerHook(lr_scheduler, by_epoch, store_lr_in_state=True, priority=1)

Build LR scheduler

Parameters
  • lr_scheduler – LR scheduler

  • by_epoch (bool) – If True, the LR will be scheduled every epoch. Else, the LR will be scheduled every batch

  • store_lr_in_state (bool, optional) – If True, store the learning rate in each state, defaults to True

  • priority (int, optional) – Priority in the printing, hooks with small priority will be printed in front, defaults to 1

class colossalai.trainer.hooks.ThroughputHook(ignored_steps=0, priority=10)

Specialized hook class for Throughput.

Parameters

priority (int, optional) – priority of throughput hook, defaults to 10

class colossalai.trainer.hooks.LogMetricByStepHook(priority=10)

Hook to log metric by step

Parameters

priority (int, optional) – Priority in the printing, hooks with small priority will be printed in front, defaults to 10