colossalai.context.parallel_context
- class colossalai.context.parallel_context.ParallelContext(*args, **kwargs)
This class provides interface functions for users to get the parallel context, such as the global rank, the local rank, the world size, etc. of each device.
- load_config(config)
Loads the configuration from either a dict or a file.
- Parameters
config (dict or str) – Either a dict containing the configuration information or the filename of a file containing the configuration information
- Raises
TypeError – Raises a TypeError if config is neither a dict or a str
- get_global_rank()
Returns the global rank of the current device.
- Returns
The global rank of the current device
- Return type
int
- add_global_rank(parallel_mode, rank)
Adds the global rank of the current device for parallel_mode to the context.
- Parameters
parallel_mode (
colossalai.context.ParallelMode) – The parallel mode for the rankrank (int) – The rank to be added
- Raises
AssertionError – Raises an AssertionError if parallel_mode is not an instance of
colossalai.context.ParallelMode
- get_local_rank(parallel_mode)
Returns the local rank of the current device.
- Parameters
parallel_mode (
colossalai.context.ParallelMode) – The chosen parallel mode- Raises
AssertionError – Raises an AssertionError if parallel_mode is not an instance of
colossalai.context.ParallelMode- Returns
The local rank of the current device for parallel_mode
- Return type
int
- add_local_rank(parallel_mode, rank)
Adds the local rank of the current device for parallel_mode to the context.
- Parameters
parallel_mode (
colossalai.context.ParallelMode) – The parallel mode for the rankrank (int) – The rank to be added
- Raises
AssertionError – Raises an AssertionError if parallel_mode is not an instance of
colossalai.context.ParallelMode
- get_next_global_rank(parallel_mode)
Returns the global rank of the next device.
- Parameters
parallel_mode (
colossalai.context.ParallelMode) – The chosen parallel mode- Raises
AssertionError – Raises an AssertionError if parallel_mode is not an instance of
colossalai.context.ParallelMode- Returns
The global rank of the next device for parallel_mode
- Return type
int
- get_prev_global_rank(parallel_mode)
Returns the global rank of the previous device.
- Parameters
parallel_mode (
colossalai.context.ParallelMode) – The chosen parallel mode- Raises
AssertionError – Raises an AssertionError if parallel_mode is not an instance of
colossalai.context.ParallelMode- Returns
The global rank of the previous device for parallel_mode
- Return type
int
- is_first_rank(parallel_mode)
Returns a boolean value indicating whether the current device is the first one among its group for parallel_mode.
- Parameters
parallel_mode (
colossalai.context.ParallelMode) – The chosen parallel mode- Raises
AssertionError – Raises an AssertionError if parallel_mode is not an instance of
colossalai.context.ParallelMode- Returns
a boolean value indicating whether the current device is the first one among its group for parallel_mode
- Return type
bool
- is_last_rank(parallel_mode)
Returns a boolean value indicating whether the current device is the last one among its group for parallel_mode.
- Parameters
parallel_mode (
colossalai.context.ParallelMode) – The chosen parallel mode- Raises
AssertionError – Raises an AssertionError if parallel_mode is not an instance of
colossalai.context.ParallelMode- Returns
a boolean value indicating whether the current device is the last one among its group for parallel_mode
- Return type
bool
- get_world_size(parallel_mode)
Returns the world size for parallel_mode.
- Parameters
parallel_mode (
colossalai.context.ParallelMode) – The chosen parallel mode- Raises
AssertionError – Raises an AssertionError if parallel_mode is not an instance of
colossalai.context.ParallelMode- Returns
The world size for parallel_mode
- Return type
int
- add_world_size(parallel_mode, world_size)
Adds world size for parallel_mode.
- Parameters
parallel_mode (
colossalai.context.ParallelMode) – The chosen parallel modeworld_size (int) – The world size to be added
- Raises
AssertionError – Raises an AssertionError if parallel_mode is not an instance of
colossalai.context.ParallelMode
- get_group(parallel_mode)
Returns the group of the current device for parallel_mode.
- Parameters
parallel_mode (
colossalai.context.ParallelMode) – The chosen parallel mode- Raises
AssertionError – Raises an AssertionError if parallel_mode is not an instance of
colossalai.context.ParallelMode- Returns
The group of the current device for parallel_mode
- Return type
torch.distributed.ProcessGroup
- add_group(parallel_mode, group)
Adds the group of the current device for parallel_mode.
- Parameters
parallel_mode (
colossalai.context.ParallelMode) – The chosen parallel modegroup (torch.distributed.ProcessGroup) – The group to be added
- Raises
AssertionError – Raises an AssertionError if parallel_mode is not an instance of
colossalai.context.ParallelMode
- get_ranks_in_group(parallel_mode)
Returns the rank of the current device for parallel_mode in the group.
- Parameters
parallel_mode (
colossalai.context.ParallelMode) – The chosen parallel mode- Raises
AssertionError – Raises an AssertionError if parallel_mode is not an instance of
colossalai.context.ParallelMode- Returns
the rank of the current device for parallel_mode in the group
- Return type
int
- add_ranks_in_group(parallel_mode, ranks)
Adds the ranks of the current device for parallel_mode in the group.
- Parameters
parallel_mode (
colossalai.context.ParallelMode) – The chosen parallel moderanks (list) – List of ranks to be added
- Raises
AssertionError – Raises an AssertionError if parallel_mode is not an instance of
colossalai.context.ParallelMode
- init_global_dist(rank, world_size, backend, host, port)
Initializes the global distributed environment :param rank: rank for the default process group :type rank: int :param world_size: world size of the default process group :type world_size: int :param host: the master address for distributed training :type host: str :param port: the master port for distributed training :type port: str :param backend: backend for torch.distributed :type backend: str
- check_sanity()
Checks sanity of the parallel context.
- Raises
AssertionError – Raises an AssertionError if the world size does not equal to the product of data paralle size, pipeline parallel size and tensor parallel size
- init_parallel_groups()
Initializes the parallel groups.
- Raises
AssertionError – Raises an AssertionError if the field paralle is not present in the config file
- is_initialized(parallel_mode)
Returns a boolean value indicating whether parallel_mode is initialized in the current system.
- Parameters
parallel_mode (
colossalai.context.ParallelMode) – The chosen parallel mode- Returns
a boolean value indicating whether parallel_mode is initialized in the current system
- Return type
bool
- destroy()
Destroys the current distributed parallel environment.
- set_device(device_ordinal=None)
Sets distributed processes to be bound to devices.
- Parameters
device_ordinal (int, optional) – the device id to be bound to
- set_seed(seed)
Sets seeds for all random libraries.
- Parameters
seed (int) – seed for random states