colossalai.context.parallel_context
- class colossalai.context.parallel_context.ParallelContext(*args, **kwargs)[source]
This class provides interface functions for users to get the parallel context, such as the global rank, the local rank, the world size, etc. of each device.
Note
The parallel_mode used in this class should be concluded in
ParallelMode. More details aboutParallelModecould be found in parallel_mode.- load_config(config)[source]
Loads the configuration from either a dict or a file.
- Parameters
config (dict or str) – Either a dict containing the configuration information or the filename of a file containing the configuration information.
- Raises
TypeError – Raises a TypeError if config is neither a dict nor a str.
- get_global_rank()[source]
Returns the global rank of the current device.
- Returns
The global rank of the current device
- Return type
int
- add_global_rank(parallel_mode, rank)[source]
Adds the global rank of the current device for parallel_mode to the context.
- Parameters
parallel_mode (
colossalai.context.ParallelMode) – The parallel mode for the rank.rank (int) – The rank to be added
- Raises
AssertionError – Raises an AssertionError if parallel_mode is not an instance of
colossalai.context.ParallelMode.
- get_local_rank(parallel_mode)[source]
Returns the local rank of the current device.
- Parameters
parallel_mode (
colossalai.context.ParallelMode) – The chosen parallel mode.- Raises
AssertionError – Raises an AssertionError if parallel_mode is not an instance of
colossalai.context.ParallelMode.- Returns
The local rank of the current device for parallel_mode.
- Return type
int
- add_local_rank(parallel_mode, rank)[source]
Adds the local rank of the current device for parallel_mode to the context.
- Parameters
parallel_mode (
colossalai.context.ParallelMode) – The parallel mode for the rank.rank (int) – The rank to be added.
- Raises
AssertionError – Raises an AssertionError if parallel_mode is not an instance of
colossalai.context.ParallelMode.
- get_next_global_rank(parallel_mode)[source]
Returns the global rank of the next device.
- Parameters
parallel_mode (
colossalai.context.ParallelMode) – The chosen parallel mode.- Raises
AssertionError – Raises an AssertionError if parallel_mode is not an instance of
colossalai.context.ParallelMode.- Returns
The global rank of the next device for parallel_mode.
- Return type
int
- get_prev_global_rank(parallel_mode)[source]
Returns the global rank of the previous device.
- Parameters
parallel_mode (
colossalai.context.ParallelMode) – The chosen parallel mode.- Raises
AssertionError – Raises an AssertionError if parallel_mode is not an instance of
colossalai.context.ParallelMode.- Returns
The global rank of the previous device for parallel_mode.
- Return type
int
- is_first_rank(parallel_mode)[source]
Returns a boolean value indicating whether the current device is the first one among its group for parallel_mode.
- Parameters
parallel_mode (
colossalai.context.ParallelMode) – The chosen parallel mode.- Raises
AssertionError – Raises an AssertionError if parallel_mode is not an instance of
colossalai.context.ParallelMode.- Returns
a boolean value indicating whether the current device is the first one among its group for parallel_mode.
- Return type
bool
- is_last_rank(parallel_mode)[source]
Returns a boolean value indicating whether the current device is the last one among its group for parallel_mode.
- Parameters
parallel_mode (
colossalai.context.ParallelMode) – The chosen parallel mode.- Raises
AssertionError – Raises an AssertionError if parallel_mode is not an instance of
colossalai.context.ParallelMode.- Returns
a boolean value indicating whether the current device is the first one among its group for parallel_mode.
- Return type
bool
- get_world_size(parallel_mode)[source]
Returns the world size for parallel_mode.
- Parameters
parallel_mode (
colossalai.context.ParallelMode) – The chosen parallel mode.- Raises
AssertionError – Raises an AssertionError if parallel_mode is not an instance of
colossalai.context.ParallelMode.- Returns
The world size for parallel_mode.
- Return type
int
- add_world_size(parallel_mode, world_size)[source]
Adds world size for parallel_mode.
- Parameters
parallel_mode (
colossalai.context.ParallelMode) – The chosen parallel mode.world_size (int) – The world size to be added
- Raises
AssertionError – Raises an AssertionError if parallel_mode is not an instance of
colossalai.context.ParallelMode.
- get_group(parallel_mode)[source]
Returns the group of the current device for parallel_mode.
- Parameters
parallel_mode (
colossalai.context.ParallelMode) – The chosen parallel mode.- Raises
AssertionError – Raises an AssertionError if parallel_mode is not an instance of
colossalai.context.ParallelMode.- Returns
The group of the current device for parallel_mode.
- Return type
torch.distributed.ProcessGroup
- add_group(parallel_mode, group)[source]
Adds the group of the current device for parallel_mode.
- Parameters
parallel_mode (
colossalai.context.ParallelMode) – The chosen parallel mode.group (torch.distributed.ProcessGroup) – The group to be added
- Raises
AssertionError – Raises an AssertionError if parallel_mode is not an instance of
colossalai.context.ParallelMode.
- get_ranks_in_group(parallel_mode)[source]
Returns the rank of the current device for parallel_mode in the group.
- Parameters
parallel_mode (
colossalai.context.ParallelMode) – The chosen parallel mode.- Raises
AssertionError – Raises an AssertionError if parallel_mode is not an instance of
colossalai.context.ParallelMode.- Returns
The rank of the current device for parallel_mode in the group.
- Return type
int
- add_ranks_in_group(parallel_mode, ranks)[source]
Adds the ranks of the current device for parallel_mode in the group.
- Parameters
parallel_mode (
colossalai.context.ParallelMode) – The chosen parallel mode.ranks (list) – List of ranks to be added
- Raises
AssertionError – Raises an AssertionError if parallel_mode is not an instance of
colossalai.context.ParallelMode.
- init_global_dist(rank, world_size, backend, host, port)[source]
Initializes the global distributed environment
- Parameters
rank (int) – rank for the default process group.
world_size (int) – world size of the default process group.
backend (str) – backend for
torch.distributedhost (str) – the master address for distributed training.
port (str) – the master port for distributed training
- check_sanity()[source]
Checks sanity of the parallel context.
- Raises
AssertionError – Raises an AssertionError if the world size does not equal to the product of data parallel size, pipeline parallel size and tensor parallel size.
- init_parallel_groups()[source]
Initializes the parallel groups.
- Raises
AssertionError – Raises an AssertionError if the field parallel is not present in the config file.
- is_initialized(parallel_mode)[source]
Returns a boolean value indicating whether parallel_mode is initialized in the current system.
- Parameters
parallel_mode (
colossalai.context.ParallelMode) – The chosen parallel mode.- Returns
a boolean value indicating whether parallel_mode is initialized in the current system.
- Return type
bool