colossalai.communication
- colossalai.communication.all_gather(tensor, dim, parallel_mode, async_op=False)
Gathers all tensors from the parallel group and concatenates them in a specific dimension.
- Parameters
tensor (
torch.Tensor) – Tensor to be gathereddim (int) – The dimension concatenating in
parallel_mode (
colossalai.context.ParallelMode) – Parallel group mode used in this communicationasync_op (bool, optional) – Whether operations are asynchronous
- Returns
The tensor generated by all-gather
- Return type
torch.Tensor
- colossalai.communication.reduce_scatter(tensor, dim, parallel_mode, op=<ReduceOp.SUM: 0>, async_op=False)
Reduces all tensors then scatters it in a specific dimension to all members in the parallel group.
- Parameters
tensor (
torch.Tensor) – Tensor to be reduced and scattereddim (int) – The dimension scattering in
parallel_mode (
colossalai.context.ParallelMode) – Parallel group mode used in this communicationop (ReduceOp, optional) – The type of reduce operation
async_op (bool, optional) – Whether operations are asynchronous
- Returns
The tensor generated by reduce-scatter
- Return type
Tensor
- colossalai.communication.send_forward(output_tensor, next_rank=None, scatter_gather_tensors=False)
Sends the input tensor to the next member in pipeline.
- Parameters
output_tensor (
torch.Tensor) – Tensor to be sentnext_rank (int, optional) – The rank of the recipient of the tensor
- colossalai.communication.send_forward_recv_forward(output_tensor, input_tensor_shape, recv_prev=True, prev_rank=None, next_rank=None, dtype=torch.float32, scatter_gather_tensors=False)
Batched communication operation. Sends the input tensor to the next member in pipeline, while recieves the input tensor from the previous member in pipeline.
- Parameters
output_tensor (
torch.Tensor) – Tensor to be sentinput_tensor_shape (
torch.Size) – The shape of the tensor to be recieved
- Returns
The input tensor in forward step
- Return type
torch.Tensor
- colossalai.communication.send_forward_backward_recv_forward_backward(output_tensor, input_tensor_grad, input_tensor_shape, output_grad_shape, recv_prev=True, recv_next=True, prev_rank=None, next_rank=None, dtype=torch.float32, scatter_gather_tensors=False)
Batched communication operation. Sends the input tensor to the next and the grad tensor to the previous, while recieves the grad tensor from the next and the input tensor from the previous.
- Parameters
output_tensor (
torch.Tensor) – Tensor sent to the nextinput_tensor_grad (
torch.Tensor) – Tensor sent to the previousinput_tensor_shape (
torch.Size) – The shape of the tensor recieved from the previousoutput_grad_shape (
torch.Size) – The shape of the tensor recieved from the next
- Returns
(the input tensor in forward step, the grad of output tensor in forward step)
- Return type
(Tensor, Tensor)
- colossalai.communication.send_backward(input_tensor_grad, prev_rank=None, scatter_gather_tensors=False)
Sends the grad tensor to the previous member in pipeline.
- Parameters
input_tensor_grad (
torch.Tensor) – Tensor to be sentprev_rank (int, optional) – The rank of the recipient of the tensor
- colossalai.communication.send_backward_recv_backward(input_tensor_grad, output_grad_shape, recv_next=True, prev_rank=None, next_rank=None, dtype=torch.float32, scatter_gather_tensors=False)
Batched communication operation. Sends the grad tensor to the previous member in pipeline, while recieves the grad tensor from the next member in pipeline.
- Parameters
input_tensor_grad (
torch.Tensor) – Tensor to be sentoutput_grad_shape (
torch.Size) – The shape of the tensor to be recieved
- Returns
The grad of output tensor in forward step
- Return type
torch.Tensor
- colossalai.communication.send_backward_recv_forward(input_tensor_grad, input_tensor_shape, recv_prev=True, prev_rank=None, dtype=torch.float32, scatter_gather_tensors=False)
Batched communication operation. Sends the grad tensor to the previous member in pipeline, while recieves the input tensor from the previous member in pipeline.
- Parameters
input_tensor_grad (
torch.Tensor) – Tensor to be sentinput_tensor_shape (
torch.Size) – The shape of the tensor to be recieved
- Returns
The input tensor in forward step
- Return type
torch.Tensor
- colossalai.communication.send_forward_recv_backward(output_tensor, output_grad_shape, recv_next=True, next_rank=None, dtype=torch.float32, scatter_gather_tensors=False)
Batched communication operation. Sends the input tensor to the next member in pipeline, while recieves the grad tensor from the next member in pipeline.
- Parameters
output_tensor (
torch.Tensor) – Tensor to be sentoutput_grad_shape (
torch.Size) – The shape of the tensor to be recieved
- Returns
The grad of output tensor in forward step
- Return type
torch.Tensor
- colossalai.communication.recv_backward(output_grad_shape, next_rank=None, dtype=torch.float32, scatter_gather_tensors=False)
Receives the grad tensor from the next member in pipeline.
- Parameters
output_grad_shape (torch.Size) – The shape of the tensor to be recieved
next_rank (int, optional) – The rank of the source of the tensor
- Returns
The grad of output tensor in forward step
- Return type
torch.Tensor
- colossalai.communication.recv_forward(input_tensor_shape, prev_rank=None, dtype=torch.float32, scatter_gather_tensors=False)
Receives the input tensor from the previous member in pipeline.
- Parameters
input_tensor_shape (torch.Size) – The shape of the tensor to be recieved
prev_rank (int, optional) – The rank of the source of the tensor
- Returns
The input tensor in forward step
- Return type
torch.Tensor
- colossalai.communication.ring_forward(tensor_send_next, parallel_mode)
Sends a tensor to the next member and recieves a tensor from the previous member. This function returns the recieved tensor from the previous member.
- Parameters
tensor_send_next (
torch.Tensor) – Tensor sent to next memberparallel_mode (
colossalai.context.ParallelMode) – Parallel group mode used in this communication
- Returns
The tensor recieved from the previous
- Return type
torch.Tensor
- colossalai.communication.send_tensor_meta(tensor, need_meta=True, next_rank=None)
Sends tensor meta information before sending a specific tensor. Since the recipient must know the shape of the tensor in p2p communications, meta information of the tensor should be sent before communications. This function synchronizes with
recv_tensor_meta().- Parameters
tensor (Tensor) – Tensor to be sent
need_meta (bool, optional) – If False, meta information won’t be sent
next_rank (int) – The rank of the next member in pipeline parallel group
- Returns
False
- Return type
bool
- colossalai.communication.recv_tensor_meta(tensor_shape, prev_rank=None)
Recieves tensor meta information before recieving a specific tensor. Since the recipient must know the shape of the tensor in p2p communications, meta information of the tensor should be recieved before communications. This function synchronizes with
send_tensor_meta().- Parameters
tensor_shape (torch.Size) – The shape of the tensor to be recieved
prev_rank (int, optional) – The rank of the source of the tensor
- Returns
The shape of the tensor to be recieved
- Return type
torch.Size