colossalai.trainer.hooks
- class colossalai.trainer.hooks.BaseHook(priority)[source]
This class allows users to add desired actions in specific time points during training or evaluation.
- Parameters
priority (int) – Priority in the printing, hooks with small priority will be printed in front
- after_train_iter(trainer, output, label, loss)[source]
Actions after running a training iteration.
- Parameters
trainer (
Trainer) – Trainer which is using this hook.output (
torch.Tensor) – Output of the model.label (
torch.Tensor) – Labels of the input data.loss (
torch.Tensor) – Loss between the output and input data.
- after_test_iter(trainer, output, label, loss)[source]
Actions after running a testing iteration.
- Parameters
trainer (
Trainer) – Trainer which is using this hookoutput (
torch.Tensor) – Output of the modellabel (
torch.Tensor) – Labels of the input dataloss (
torch.Tensor) – Loss between the output and input data
- class colossalai.trainer.hooks.MetricHook(priority)[source]
Specialized hook classes for
Metric. Some help metric collectors initialize, reset and update their states. Others are used to display and record the metric.- Parameters
priority (int) – Priority in the printing, hooks with small priority will be printed in front defaults to 1. If different hooks share same priority, the order of printing would depend on the hooks order in the hook list.
- class colossalai.trainer.hooks.LoadCheckpointHook(checkpoint_dir=None, epoch=- 1, finetune=False, strict=False, suffix='', priority=0)[source]
Loads the model before training process.
- Parameters
checkpoint_dir (str, optional) – Directory of saving checkpoint, defaults to None.
epoch (str, optional) – Loading checkpoint of setting epoch numbers, defaults to -1. Epoch equals to -1 means choosing the latest checkpoint.
finetune (bool, optional) – Whether allows to load a part of the model, defaults to False.
strict (bool, optional) – Whether to strictly enforce that the keys in
state_dictof the checkpoint match the names of parameters and buffers in model, defaults to False.suffix (str, optional) – Suffix of checkpoint file path, defaults to ‘’.
priority (int, optional) – Priority in the printing, hooks with small priority will be printed in front, defaults to 0. If different hooks share same priority, the order of printing would depend on the hooks order in the hook list.
- class colossalai.trainer.hooks.SaveCheckpointHook(interval=1, checkpoint_dir=None, suffix='', priority=10)[source]
Saves the model by interval in training process.
- Parameters
interval (int, optional) – Saving interval, defaults to 1.
checkpoint_dir (str, optional) – Directory of saving checkpoint, defaults to None.
suffix (str, optional) – Saving suffix of the file, defaults to ‘’.
priority (int, optional) – Priority in the printing, hooks with small priority will be printed in front defaults to 10. If different hooks share same priority, the order of printing would depend on the hooks order in the hook list.
- class colossalai.trainer.hooks.LossHook(priority=0)[source]
Specialized hook class for
Loss.- Parameters
priority (int, optional) – Priority in the printing, hooks with small priority will be printed in front defaults to 0. If different hooks share same priority, the order of printing would depend on the hooks order in the hook list.
- class colossalai.trainer.hooks.AccuracyHook(accuracy_func, priority=0)[source]
Specialized hook class for
Accuracy.- Parameters
accuracy_func (
typing.Callable) – Accuracy function for the classification task.priority (int, optional) – Priority in the printing, hooks with small priority will be printed in front defaults to 0. If different hooks share same priority, the order of printing would depend on the hooks order in the hook list.
- class colossalai.trainer.hooks.LogMetricByEpochHook(logger, interval=1, priority=10)[source]
Specialized hook to record the metric to log.
- Parameters
logger (
colossalai.logging.DistributedLogger) – Logger for recording the log information.interval (int, optional) – Interval of printing log information, defaults to 1.
priority (int, optional) – Priority in the printing, hooks with small priority will be printed in front, defaults to 10. If different hooks share same priority, the order of printing would depend on the hooks order in the hook list.
- class colossalai.trainer.hooks.TensorboardHook(log_dir, ranks=None, parallel_mode=ParallelMode.GLOBAL, priority=10)[source]
Specialized hook to record the metric to Tensorboard.
- Parameters
log_dir (str) – Directory of log.
ranks (list) – Ranks of processors.
parallel_mode (
colossalai.context.parallel_mode.ParallelMode, optional) – Parallel mode used in trainer, defaults to colossalai.context.parallel_mode.ParallelMode.GLOBAL.priority (int, optional) – Priority in the printing, hooks with small priority will be printed in front, defaults to 10. If different hooks share same priority, the order of printing would depend on the hooks order in the hook list.
- class colossalai.trainer.hooks.LogTimingByEpochHook(timer, logger, interval=1, priority=10, log_eval=True, ignore_num_train_steps=0)[source]
Specialized hook to write timing record to log.
- Parameters
timer (
colossalai.utils.MultiTimer) – Timer for the hook.logger (
colossalai.logging.DistributedLogger) – Logger for recording the log information.interval (int, optional) – Interval of printing log information, defaults to 1.
priority (int, optional) – Priority in the printing, hooks with small priority will be printed in front defaults to 10. If different hooks share same priority, the order of printing would depend on the hooks order in the hook list.
log_eval (bool, optional) – Whether writes in evaluation, defaults to True.
ignore_num_train_steps (int, optional) – Number of training steps to ignore, defaults to 0.
- class colossalai.trainer.hooks.LogMemoryByEpochHook(logger, interval=1, priority=10, log_eval=True, report_cpu=False)[source]
Specialized Hook to write memory usage record to log.
- Parameters
logger (
colossalai.logging.DistributedLogger) – Logger for recording the log information.interval (int, optional) – Interval of printing log information, defaults to 1.
priority (int, optional) – Priority in the printing, hooks with small priority will be printed in front defaults to 1. If different hooks share same priority, the order of printing would depend on the hooks order in the hook list.
log_eval (bool, optional) – Whether writes in evaluation, defaults to True.
- class colossalai.trainer.hooks.LRSchedulerHook(lr_scheduler, by_epoch, store_lr_in_state=True, priority=1)[source]
Build LR scheduler for trainer.
- Parameters
lr_scheduler (
colossalai.nn.lr_scheduler) – The specific LR scheduler in range ofcolossalai.nn.lr_scheduler, more details aboutlr_schedulercould be found in lr_scheduler.by_epoch (bool) – If True, the LR will be scheduled every epoch. Else, the LR will be scheduled every batch.
store_lr_in_state (bool, optional) – If True, store the learning rate in each state, defaults to True.
priority (int, optional) – Priority in the printing, hooks with small priority will be printed in front defaults to 1. If different hooks share same priority, the order of printing would depend on the hooks order in the hook list.
- class colossalai.trainer.hooks.ThroughputHook(ignored_steps=0, priority=10)[source]
Specialized hook class for
Throughput. Hook to measure execution throughput (samples/sec).- Parameters
ignored_steps (int, optional) – the number of initial training steps to ignore.
priority (int, optional) – Priority in the printing, hooks with small priority will be printed in front defaults to 10. If different hooks share same priority, the order of printing would depend on the hooks order in the hook list.
- class colossalai.trainer.hooks.LogMetricByStepHook(priority=10)[source]
Hook to log metric by step.
- Parameters
priority (int, optional) – Priority in the printing, hooks with small priority will be printed in front, defaults to 10. If different hooks share same priority, the order of printing would depend on the hooks order in the hook list.