colossalai.context.parallel_context

class colossalai.context.parallel_context.ParallelContext(*args, **kwargs)

This class provides interface functions for users to get the parallel context, such as the global rank, the local rank, the world size, etc. of each device.

load_config(config)

Loads the configuration from either a dict or a file.

Parameters

config (dict or str) – Either a dict containing the configuration information or the filename of a file containing the configuration information

Raises

TypeError – Raises a TypeError if config is neither a dict or a str

get_global_rank()

Returns the global rank of the current device.

Returns

The global rank of the current device

Return type

int

add_global_rank(parallel_mode, rank)

Adds the global rank of the current device for parallel_mode to the context.

Parameters
  • parallel_mode (colossalai.context.ParallelMode) – The parallel mode for the rank

  • rank (int) – The rank to be added

Raises

AssertionError – Raises an AssertionError if parallel_mode is not an instance of colossalai.context.ParallelMode

get_local_rank(parallel_mode)

Returns the local rank of the current device.

Parameters

parallel_mode (colossalai.context.ParallelMode) – The chosen parallel mode

Raises

AssertionError – Raises an AssertionError if parallel_mode is not an instance of colossalai.context.ParallelMode

Returns

The local rank of the current device for parallel_mode

Return type

int

add_local_rank(parallel_mode, rank)

Adds the local rank of the current device for parallel_mode to the context.

Parameters
  • parallel_mode (colossalai.context.ParallelMode) – The parallel mode for the rank

  • rank (int) – The rank to be added

Raises

AssertionError – Raises an AssertionError if parallel_mode is not an instance of colossalai.context.ParallelMode

get_next_global_rank(parallel_mode)

Returns the global rank of the next device.

Parameters

parallel_mode (colossalai.context.ParallelMode) – The chosen parallel mode

Raises

AssertionError – Raises an AssertionError if parallel_mode is not an instance of colossalai.context.ParallelMode

Returns

The global rank of the next device for parallel_mode

Return type

int

get_prev_global_rank(parallel_mode)

Returns the global rank of the previous device.

Parameters

parallel_mode (colossalai.context.ParallelMode) – The chosen parallel mode

Raises

AssertionError – Raises an AssertionError if parallel_mode is not an instance of colossalai.context.ParallelMode

Returns

The global rank of the previous device for parallel_mode

Return type

int

is_first_rank(parallel_mode)

Returns a boolean value indicating whether the current device is the first one among its group for parallel_mode.

Parameters

parallel_mode (colossalai.context.ParallelMode) – The chosen parallel mode

Raises

AssertionError – Raises an AssertionError if parallel_mode is not an instance of colossalai.context.ParallelMode

Returns

a boolean value indicating whether the current device is the first one among its group for parallel_mode

Return type

bool

is_last_rank(parallel_mode)

Returns a boolean value indicating whether the current device is the last one among its group for parallel_mode.

Parameters

parallel_mode (colossalai.context.ParallelMode) – The chosen parallel mode

Raises

AssertionError – Raises an AssertionError if parallel_mode is not an instance of colossalai.context.ParallelMode

Returns

a boolean value indicating whether the current device is the last one among its group for parallel_mode

Return type

bool

get_world_size(parallel_mode)

Returns the world size for parallel_mode.

Parameters

parallel_mode (colossalai.context.ParallelMode) – The chosen parallel mode

Raises

AssertionError – Raises an AssertionError if parallel_mode is not an instance of colossalai.context.ParallelMode

Returns

The world size for parallel_mode

Return type

int

add_world_size(parallel_mode, world_size)

Adds world size for parallel_mode.

Parameters
  • parallel_mode (colossalai.context.ParallelMode) – The chosen parallel mode

  • world_size (int) – The world size to be added

Raises

AssertionError – Raises an AssertionError if parallel_mode is not an instance of colossalai.context.ParallelMode

get_group(parallel_mode)

Returns the group of the current device for parallel_mode.

Parameters

parallel_mode (colossalai.context.ParallelMode) – The chosen parallel mode

Raises

AssertionError – Raises an AssertionError if parallel_mode is not an instance of colossalai.context.ParallelMode

Returns

The group of the current device for parallel_mode

Return type

torch.distributed.ProcessGroup

add_group(parallel_mode, group)

Adds the group of the current device for parallel_mode.

Parameters
  • parallel_mode (colossalai.context.ParallelMode) – The chosen parallel mode

  • group (torch.distributed.ProcessGroup) – The group to be added

Raises

AssertionError – Raises an AssertionError if parallel_mode is not an instance of colossalai.context.ParallelMode

get_ranks_in_group(parallel_mode)

Returns the rank of the current device for parallel_mode in the group.

Parameters

parallel_mode (colossalai.context.ParallelMode) – The chosen parallel mode

Raises

AssertionError – Raises an AssertionError if parallel_mode is not an instance of colossalai.context.ParallelMode

Returns

the rank of the current device for parallel_mode in the group

Return type

int

add_ranks_in_group(parallel_mode, ranks)

Adds the ranks of the current device for parallel_mode in the group.

Parameters
  • parallel_mode (colossalai.context.ParallelMode) – The chosen parallel mode

  • ranks (list) – List of ranks to be added

Raises

AssertionError – Raises an AssertionError if parallel_mode is not an instance of colossalai.context.ParallelMode

init_global_dist(rank, world_size, backend, host, port)

Initializes the global distributed environment :param rank: rank for the default process group :type rank: int :param world_size: world size of the default process group :type world_size: int :param host: the master address for distributed training :type host: str :param port: the master port for distributed training :type port: str :param backend: backend for torch.distributed :type backend: str

check_sanity()

Checks sanity of the parallel context.

Raises

AssertionError – Raises an AssertionError if the world size does not equal to the product of data paralle size, pipeline parallel size and tensor parallel size

init_parallel_groups()

Initializes the parallel groups.

Raises

AssertionError – Raises an AssertionError if the field paralle is not present in the config file

is_initialized(parallel_mode)

Returns a boolean value indicating whether parallel_mode is initialized in the current system.

Parameters

parallel_mode (colossalai.context.ParallelMode) – The chosen parallel mode

Returns

a boolean value indicating whether parallel_mode is initialized in the current system

Return type

bool

destroy()

Destroys the current distributed parallel environment.

set_device(device_ordinal=None)

Sets distributed processes to be bound to devices.

Parameters

device_ordinal (int, optional) – the device id to be bound to

set_seed(seed)

Sets seeds for all random libraries.

Parameters

seed (int) – seed for random states