colossalai.trainer.hooks
- class colossalai.trainer.hooks.BaseHook(priority)
This class allows users to add desired actions in specific time points during training or evaluation.
- Parameters
priority (int) – Priority in the printing, hooks with small priority will be printed in front
- after_hook_is_attached(trainer)
Actions after hooks are attached to trainer.
- before_train(trainer)
Actions before training.
- after_train(trainer)
Actions after training.
- before_train_iter(trainer)
Actions before running a training iteration.
- after_train_iter(trainer, output, label, loss)
Actions after running a training iteration.
- Parameters
trainer (
Trainer) – Trainer which is using this hookoutput (torch.Tensor) – Output of the model
label (torch.Tensor) – Labels of the input data
loss (torch.Tensor) – Loss between the output and input data
- before_train_epoch(trainer)
Actions before starting a training epoch.
- after_train_epoch(trainer)
Actions after finishing a training epoch.
- before_test(trainer)
Actions before evaluation.
- after_test(trainer)
Actions after evaluation.
- before_test_epoch(trainer)
Actions before starting a testing epoch.
- after_test_epoch(trainer)
Actions after finishing a testing epoch.
- before_test_iter(trainer)
Actions before running a testing iteration.
- after_test_iter(trainer, output, label, loss)
Actions after running a testing iteration.
- Parameters
trainer (
Trainer) – Trainer which is using this hookoutput (Tensor) – Output of the model
label (Tensor) – Labels of the input data
loss (Tensor) – Loss between the output and input data
- init_runner_states(trainer, key, val)
Initializes trainer’s state.
- Parameters
trainer (
Trainer) – Trainer which is using this hookkey – Key of reseting state
val – Value of reseting state
- class colossalai.trainer.hooks.MetricHook(priority)
Specialized hook classes for
Metric. Some help metric collectors initialize, reset and update their states. Others are used to display and record the metric.- Parameters
priority (int) – Priority in the printing, hooks with small priority will be printed in front
- class colossalai.trainer.hooks.LoadCheckpointHook(checkpoint_dir=None, epoch=- 1, finetune=False, strict=False, suffix='', priority=0)
Loads the model before training process.
- Parameters
checkpoint_dir (str, optional) – Directory of saving checkpoint, defaults to None
epoch (str, optional) – Epoch number to be set, defaults to -1
finetune (bool, optional) – Whether allows to load a part of the model, defaults to False
strict (bool, optional) – Whether loads a model that has the same shape of parameters, defaults to False
suffix (str, optional) – Suffic, defaults to ‘’
priority (int, optional) – Priority in the printing, hooks with small priority will be printed in front, defaults to 0
- before_train(trainer)
Loads parameters to the model before training.
- class colossalai.trainer.hooks.SaveCheckpointHook(interval=1, checkpoint_dir=None, suffix='', priority=10)
Saves the model by interval in training process.
- Parameters
interval (int, optional) – Saving interval, defaults to 1
checkpoint_dir (str, optional) – Directory of saving checkpoint, defaults to None
suffix (str, optional) – Saving suffix of the file, defaults to ‘’
priority (int, optional) – Priority in the printing, hooks with small priority will be printed in front, defaults to 10
- after_train_epoch(trainer)
Saves the model after a training epoch.
- class colossalai.trainer.hooks.LossHook(priority=0)
Specialized hook class for
Loss.- Parameters
priority (int, optional) – Priority in the printing, hooks with small priority will be printed in front, defaults to 0
- class colossalai.trainer.hooks.AccuracyHook(accuracy_func, priority=0)
Specialized hook class for
Accuracy.- Parameters
accuracy_func (Callable) – Priority in the printing, hooks with small priority will be printed in front
priority (int, optional) – Priority in the printing, hooks with small priority will be printed in front, defaults to 0
- class colossalai.trainer.hooks.LogMetricByEpochHook(logger, interval=1, priority=10)
Specialized hook to record the metric to log.
- Parameters
logger – Logger for the log
interval (int, optional) – Recording interval, defaults to 1
priority (int, optional) – Priority in the printing, hooks with small priority will be printed in front, defaults to 10
- class colossalai.trainer.hooks.TensorboardHook(log_dir, ranks=None, parallel_mode=ParallelMode.GLOBAL, priority=10)
Specialized hook to record the metric to Tensorboard.
- Parameters
log_dir (str) – Directory of log
ranks (List) – Ranks of processors
parallel_mode (
colossalai.context.parallel_mode.ParallelMode, optional) – Parallel mode, defaults to colossalai.context.parallel_mode.ParallelMode.GLOBALpriority (int, optional) – Priority in the printing, hooks with small priority will be printed in front, defaults to 10
- class colossalai.trainer.hooks.LogTimingByEpochHook(timer, logger, interval=1, priority=10, log_eval=True, ignore_num_train_steps=0)
Specialized hook to write timing record to log.
- Parameters
timer (
colossalai.utils.MultiTimer) – Timer for the hooklogger (
colossalai.logging.DistributedLogger) – Logger for the loginterval (int, optional) – Recording interval, defaults to 1
priority (int, optional) – Priority in the printing, hooks with small priority will be printed in front, defaults to 10
log_eval (bool, optional) – Whether writes in evaluation, defaults to True
ignore_num_train_steps (int, optional) – Number of training steps to ignore, defaults to 0
- after_train_epoch(trainer)
Writes log after finishing a training epoch.
- after_test_epoch(trainer)
Writes log after finishing a testing epoch.
- class colossalai.trainer.hooks.LogMemoryByEpochHook(logger, interval=1, priority=10, log_eval=True, report_cpu=False)
Specialized Hook to write memory usage record to log.
- Parameters
logger (colossalai.logging.DistributedLogger) – Logger for the log
interval (int, optional) – Recording interval, defaults to 1
priority (int, optional) – Priority in the printing, hooks with small priority will be printed in front, defaults to 10
log_eval (bool, optional) – Whether writes in evaluation, defaults to True
- before_train(trainer)
Resets before training.
- after_train_epoch(trainer)
Writes log after finishing a training epoch.
- after_test(trainer)
Reports after testing.
- class colossalai.trainer.hooks.LRSchedulerHook(lr_scheduler, by_epoch, store_lr_in_state=True, priority=1)
Build LR scheduler
- Parameters
lr_scheduler – LR scheduler
by_epoch (bool) – If True, the LR will be scheduled every epoch. Else, the LR will be scheduled every batch
store_lr_in_state (bool, optional) – If True, store the learning rate in each state, defaults to True
priority (int, optional) – Priority in the printing, hooks with small priority will be printed in front, defaults to 1
- class colossalai.trainer.hooks.ThroughputHook(ignored_steps=0, priority=10)
Specialized hook class for
Throughput.- Parameters
priority (int, optional) – priority of throughput hook, defaults to 10
- class colossalai.trainer.hooks.LogMetricByStepHook(priority=10)
Hook to log metric by step
- Parameters
priority (int, optional) – Priority in the printing, hooks with small priority will be printed in front, defaults to 10