colossalai.context.parallel_context

class colossalai.context.parallel_context.ParallelContext(*args, **kwargs)[source]

This class provides interface functions for users to get the parallel context, such as the global rank, the local rank, the world size, etc. of each device.

Note

The parallel_mode used in this class should be concluded in ParallelMode. More details about ParallelMode could be found in parallel_mode.

load_config(config)[source]

Loads the configuration from either a dict or a file.

Parameters

config (dict or str) – Either a dict containing the configuration information or the filename of a file containing the configuration information.

Raises

TypeError – Raises a TypeError if config is neither a dict nor a str.

get_global_rank()[source]

Returns the global rank of the current device.

Returns

The global rank of the current device

Return type

int

add_global_rank(parallel_mode, rank)[source]

Adds the global rank of the current device for parallel_mode to the context.

Parameters
  • parallel_mode (colossalai.context.ParallelMode) – The parallel mode for the rank.

  • rank (int) – The rank to be added

Raises

AssertionError – Raises an AssertionError if parallel_mode is not an instance of colossalai.context.ParallelMode.

get_local_rank(parallel_mode)[source]

Returns the local rank of the current device.

Parameters

parallel_mode (colossalai.context.ParallelMode) – The chosen parallel mode.

Raises

AssertionError – Raises an AssertionError if parallel_mode is not an instance of colossalai.context.ParallelMode.

Returns

The local rank of the current device for parallel_mode.

Return type

int

add_local_rank(parallel_mode, rank)[source]

Adds the local rank of the current device for parallel_mode to the context.

Parameters
  • parallel_mode (colossalai.context.ParallelMode) – The parallel mode for the rank.

  • rank (int) – The rank to be added.

Raises

AssertionError – Raises an AssertionError if parallel_mode is not an instance of colossalai.context.ParallelMode.

get_next_global_rank(parallel_mode)[source]

Returns the global rank of the next device.

Parameters

parallel_mode (colossalai.context.ParallelMode) – The chosen parallel mode.

Raises

AssertionError – Raises an AssertionError if parallel_mode is not an instance of colossalai.context.ParallelMode.

Returns

The global rank of the next device for parallel_mode.

Return type

int

get_prev_global_rank(parallel_mode)[source]

Returns the global rank of the previous device.

Parameters

parallel_mode (colossalai.context.ParallelMode) – The chosen parallel mode.

Raises

AssertionError – Raises an AssertionError if parallel_mode is not an instance of colossalai.context.ParallelMode.

Returns

The global rank of the previous device for parallel_mode.

Return type

int

is_first_rank(parallel_mode)[source]

Returns a boolean value indicating whether the current device is the first one among its group for parallel_mode.

Parameters

parallel_mode (colossalai.context.ParallelMode) – The chosen parallel mode.

Raises

AssertionError – Raises an AssertionError if parallel_mode is not an instance of colossalai.context.ParallelMode.

Returns

a boolean value indicating whether the current device is the first one among its group for parallel_mode.

Return type

bool

is_last_rank(parallel_mode)[source]

Returns a boolean value indicating whether the current device is the last one among its group for parallel_mode.

Parameters

parallel_mode (colossalai.context.ParallelMode) – The chosen parallel mode.

Raises

AssertionError – Raises an AssertionError if parallel_mode is not an instance of colossalai.context.ParallelMode.

Returns

a boolean value indicating whether the current device is the first one among its group for parallel_mode.

Return type

bool

get_world_size(parallel_mode)[source]

Returns the world size for parallel_mode.

Parameters

parallel_mode (colossalai.context.ParallelMode) – The chosen parallel mode.

Raises

AssertionError – Raises an AssertionError if parallel_mode is not an instance of colossalai.context.ParallelMode.

Returns

The world size for parallel_mode.

Return type

int

add_world_size(parallel_mode, world_size)[source]

Adds world size for parallel_mode.

Parameters
  • parallel_mode (colossalai.context.ParallelMode) – The chosen parallel mode.

  • world_size (int) – The world size to be added

Raises

AssertionError – Raises an AssertionError if parallel_mode is not an instance of colossalai.context.ParallelMode.

get_group(parallel_mode)[source]

Returns the group of the current device for parallel_mode.

Parameters

parallel_mode (colossalai.context.ParallelMode) – The chosen parallel mode.

Raises

AssertionError – Raises an AssertionError if parallel_mode is not an instance of colossalai.context.ParallelMode.

Returns

The group of the current device for parallel_mode.

Return type

torch.distributed.ProcessGroup

add_group(parallel_mode, group)[source]

Adds the group of the current device for parallel_mode.

Parameters
  • parallel_mode (colossalai.context.ParallelMode) – The chosen parallel mode.

  • group (torch.distributed.ProcessGroup) – The group to be added

Raises

AssertionError – Raises an AssertionError if parallel_mode is not an instance of colossalai.context.ParallelMode.

get_ranks_in_group(parallel_mode)[source]

Returns the rank of the current device for parallel_mode in the group.

Parameters

parallel_mode (colossalai.context.ParallelMode) – The chosen parallel mode.

Raises

AssertionError – Raises an AssertionError if parallel_mode is not an instance of colossalai.context.ParallelMode.

Returns

The rank of the current device for parallel_mode in the group.

Return type

int

add_ranks_in_group(parallel_mode, ranks)[source]

Adds the ranks of the current device for parallel_mode in the group.

Parameters
  • parallel_mode (colossalai.context.ParallelMode) – The chosen parallel mode.

  • ranks (list) – List of ranks to be added

Raises

AssertionError – Raises an AssertionError if parallel_mode is not an instance of colossalai.context.ParallelMode.

init_global_dist(rank, world_size, backend, host, port)[source]

Initializes the global distributed environment

Parameters
  • rank (int) – rank for the default process group.

  • world_size (int) – world size of the default process group.

  • backend (str) – backend for torch.distributed

  • host (str) – the master address for distributed training.

  • port (str) – the master port for distributed training

check_sanity()[source]

Checks sanity of the parallel context.

Raises

AssertionError – Raises an AssertionError if the world size does not equal to the product of data parallel size, pipeline parallel size and tensor parallel size.

init_parallel_groups()[source]

Initializes the parallel groups.

Raises

AssertionError – Raises an AssertionError if the field parallel is not present in the config file.

is_initialized(parallel_mode)[source]

Returns a boolean value indicating whether parallel_mode is initialized in the current system.

Parameters

parallel_mode (colossalai.context.ParallelMode) – The chosen parallel mode.

Returns

a boolean value indicating whether parallel_mode is initialized in the current system.

Return type

bool

destroy()[source]

Destroys the current distributed parallel environment.

set_device(device_ordinal=None)[source]

Sets distributed processes to be bound to devices.

Parameters

device_ordinal (int, optional) – the device id to be bound to

set_seed(seed)[source]

Sets seeds for all random libraries.

Parameters

seed (int) – seed for random states