colossalai.utils.profiler
- class colossalai.utils.profiler.CommProfiler(depth=0, total_count=0, total_comm_vol=0, total_cuda_time=0)[source]
Communication profiler. Records all communication events.
- class colossalai.utils.profiler.PcieProfiler(dtype='fp32', depth=1)[source]
Pcie profiler. Records all data transmission between CPU and GPU.
TODO: Merge pcie profiler into communication profiler
- class colossalai.utils.profiler.MemProfiler(engine, warmup=50, refreshrate=10)[source]
Wraper of MemOpHook, used to show GPU memory usage through each iteration
To use this profiler, you need to pass an engine instance. And the usage is same like CommProfiler.
mm_prof = MemProfiler(engine) with ProfilerContext([mm_prof]) as prof:
writer = SummaryWriter(“mem”) engine.train() … prof.to_file(“./log”) prof.to_tensorboard(writer)
- class colossalai.utils.profiler.ProfilerContext(profilers=None, enable=True)[source]
Profiler context manager Usage:
```python world_size = 4 inputs = torch.randn(10, 10, dtype=torch.float32, device=get_current_device()) outputs = torch.empty(world_size, 10, 10, dtype=torch.float32, device=get_current_device()) outputs_list = list(torch.chunk(outputs, chunks=world_size, dim=0)) cc_prof = CommProfiler() with ProfilerContext([cc_prof]) as prof: op = dist.all_reduce(inputs, async_op=True) dist.all_gather(outputs_list, inputs) op.wait() dist.reduce_scatter(inputs, outputs_list) dist.broadcast(inputs, 0) dist.reduce(inputs, 0) prof.show() ```