colossalai.communication

colossalai.communication.all_gather(tensor, dim, parallel_mode, async_op=False)

Gathers all tensors from the parallel group and concatenates them in a specific dimension.

Parameters

tensor (torch.Tensor) – Tensor to be gathered
dim (int) – The dimension concatenating in
parallel_mode (colossalai.context.ParallelMode) – Parallel group mode used in this communication
async_op (bool, optional) – Whether operations are asynchronous

Returns

The tensor generated by all-gather

Return type

torch.Tensor

colossalai.communication.reduce_scatter(tensor, dim, parallel_mode, op=<ReduceOp.SUM: 0>, async_op=False)

Reduces all tensors then scatters it in a specific dimension to all members in the parallel group.

Parameters

tensor (torch.Tensor) – Tensor to be reduced and scattered
dim (int) – The dimension scattering in
parallel_mode (colossalai.context.ParallelMode) – Parallel group mode used in this communication
op (ReduceOp, optional) – The type of reduce operation
async_op (bool, optional) – Whether operations are asynchronous

Returns

The tensor generated by reduce-scatter

Return type

Tensor

colossalai.communication.send_forward(output_tensor, next_rank=None, scatter_gather_tensors=False)

Sends the input tensor to the next member in pipeline.

Parameters

output_tensor (torch.Tensor) – Tensor to be sent
next_rank (int, optional) – The rank of the recipient of the tensor

colossalai.communication.send_forward_recv_forward(output_tensor, input_tensor_shape, recv_prev=True, prev_rank=None, next_rank=None, dtype=torch.float32, scatter_gather_tensors=False)

Batched communication operation. Sends the input tensor to the next member in pipeline, while recieves the input tensor from the previous member in pipeline.

Parameters

output_tensor (torch.Tensor) – Tensor to be sent
input_tensor_shape (torch.Size) – The shape of the tensor to be recieved

Returns

The input tensor in forward step

Return type

torch.Tensor

colossalai.communication.send_forward_backward_recv_forward_backward(output_tensor, input_tensor_grad, input_tensor_shape, output_grad_shape, recv_prev=True, recv_next=True, prev_rank=None, next_rank=None, dtype=torch.float32, scatter_gather_tensors=False)

Batched communication operation. Sends the input tensor to the next and the grad tensor to the previous, while recieves the grad tensor from the next and the input tensor from the previous.

Parameters

output_tensor (torch.Tensor) – Tensor sent to the next
input_tensor_grad (torch.Tensor) – Tensor sent to the previous
input_tensor_shape (torch.Size) – The shape of the tensor recieved from the previous
output_grad_shape (torch.Size) – The shape of the tensor recieved from the next

Returns

(the input tensor in forward step, the grad of output tensor in forward step)

Return type

(Tensor, Tensor)

colossalai.communication.send_backward(input_tensor_grad, prev_rank=None, scatter_gather_tensors=False)

Sends the grad tensor to the previous member in pipeline.

Parameters

input_tensor_grad (torch.Tensor) – Tensor to be sent
prev_rank (int, optional) – The rank of the recipient of the tensor

colossalai.communication.send_backward_recv_backward(input_tensor_grad, output_grad_shape, recv_next=True, prev_rank=None, next_rank=None, dtype=torch.float32, scatter_gather_tensors=False)

Batched communication operation. Sends the grad tensor to the previous member in pipeline, while recieves the grad tensor from the next member in pipeline.

Parameters

input_tensor_grad (torch.Tensor) – Tensor to be sent
output_grad_shape (torch.Size) – The shape of the tensor to be recieved

Returns

The grad of output tensor in forward step

Return type

torch.Tensor

colossalai.communication.send_backward_recv_forward(input_tensor_grad, input_tensor_shape, recv_prev=True, prev_rank=None, dtype=torch.float32, scatter_gather_tensors=False)

Batched communication operation. Sends the grad tensor to the previous member in pipeline, while recieves the input tensor from the previous member in pipeline.

Parameters

input_tensor_grad (torch.Tensor) – Tensor to be sent
input_tensor_shape (torch.Size) – The shape of the tensor to be recieved

Returns

The input tensor in forward step

Return type

torch.Tensor

colossalai.communication.send_forward_recv_backward(output_tensor, output_grad_shape, recv_next=True, next_rank=None, dtype=torch.float32, scatter_gather_tensors=False)

Batched communication operation. Sends the input tensor to the next member in pipeline, while recieves the grad tensor from the next member in pipeline.

Parameters

output_tensor (torch.Tensor) – Tensor to be sent
output_grad_shape (torch.Size) – The shape of the tensor to be recieved

Returns

The grad of output tensor in forward step

Return type

torch.Tensor

colossalai.communication.recv_backward(output_grad_shape, next_rank=None, dtype=torch.float32, scatter_gather_tensors=False)

Receives the grad tensor from the next member in pipeline.

Parameters

output_grad_shape (torch.Size) – The shape of the tensor to be recieved
next_rank (int, optional) – The rank of the source of the tensor

Returns

The grad of output tensor in forward step

Return type

torch.Tensor

colossalai.communication.recv_forward(input_tensor_shape, prev_rank=None, dtype=torch.float32, scatter_gather_tensors=False)

Receives the input tensor from the previous member in pipeline.

Parameters

input_tensor_shape (torch.Size) – The shape of the tensor to be recieved
prev_rank (int, optional) – The rank of the source of the tensor

Returns

The input tensor in forward step

Return type

torch.Tensor

colossalai.communication.ring_forward(tensor_send_next, parallel_mode)

Sends a tensor to the next member and recieves a tensor from the previous member. This function returns the recieved tensor from the previous member.

Parameters

tensor_send_next (torch.Tensor) – Tensor sent to next member
parallel_mode (colossalai.context.ParallelMode) – Parallel group mode used in this communication

Returns

The tensor recieved from the previous

Return type

torch.Tensor

colossalai.communication.send_tensor_meta(tensor, need_meta=True, next_rank=None)

Sends tensor meta information before sending a specific tensor. Since the recipient must know the shape of the tensor in p2p communications, meta information of the tensor should be sent before communications. This function synchronizes with recv_tensor_meta().

Parameters

tensor (Tensor) – Tensor to be sent
need_meta (bool, optional) – If False, meta information won’t be sent
next_rank (int) – The rank of the next member in pipeline parallel group

Returns

False

Return type

bool

colossalai.communication.recv_tensor_meta(tensor_shape, prev_rank=None)

Recieves tensor meta information before recieving a specific tensor. Since the recipient must know the shape of the tensor in p2p communications, meta information of the tensor should be recieved before communications. This function synchronizes with send_tensor_meta().

Parameters

tensor_shape (torch.Size) – The shape of the tensor to be recieved
prev_rank (int, optional) – The rank of the source of the tensor

Returns

The shape of the tensor to be recieved

Return type

torch.Size