colossalai.utils.profiler

class colossalai.utils.profiler.CommProfiler(depth=0, total_count=0, total_comm_vol=0, total_cuda_time=0)[source]: Communication profiler. Records all communication events.

class colossalai.utils.profiler.PcieProfiler(dtype='fp32', depth=1)[source]

Pcie profiler. Records all data transmission between CPU and GPU.

TODO: Merge pcie profiler into communication profiler

class colossalai.utils.profiler.MemProfiler(engine, warmup=50, refreshrate=10)[source]

Wraper of MemOpHook, used to show GPU memory usage through each iteration

To use this profiler, you need to pass an engine instance. And the usage is same like CommProfiler.

mm_prof = MemProfiler(engine) with ProfilerContext([mm_prof]) as prof:

writer = SummaryWriter(“mem”) engine.train() … prof.to_file(“./log”) prof.to_tensorboard(writer)

class colossalai.utils.profiler.ProfilerContext(profilers=None, enable=True)[source]

Profiler context manager Usage:

```python
    world_size = 4
    inputs = torch.randn(10, 10, dtype=torch.float32, device=get_current_device())
    outputs = torch.empty(world_size, 10, 10, dtype=torch.float32, device=get_current_device())
    outputs_list = list(torch.chunk(outputs, chunks=world_size, dim=0))

    cc_prof = CommProfiler()

    with ProfilerContext([cc_prof]) as prof:
        op = dist.all_reduce(inputs, async_op=True)
        dist.all_gather(outputs_list, inputs)
        op.wait()
        dist.reduce_scatter(inputs, outputs_list)
        dist.broadcast(inputs, 0)
        dist.reduce(inputs, 0)

    prof.show()
```