megengine.functional.distributed.reduce_scatter_sum#

reduce_scatter_sum(inp, group=WORLD, device=None, axis=0)[source]#

Reduce tensors across the specified group by sum and split them at first dimension.

Parameters:
  • inp (Tensor) – Input tensor.

  • group (Optional[Group]) – The process group to work on. The default group is WORLD which means all processes available. You can use a list of process ranks to create new group to work on it, e.g. [1, 3, 5].

  • device (Optional[str]) – The specific device to execute this operator. None default device means the device of inp will be used. Specify “gpu0:1” to execute this operator on diffrent cuda stream, 1 is stream id, and default stream id is 0.

  • axis – The split axis for collective_comm result The default axis is 0, the data will split in the 0 axis

Return type:

Tensor

Returns:

Split tensor.

Examples

input = Tensor([0 1])
# Rank 0 # input: Tensor([0 1])
# Rank 1 # input: Tensor([0 1])
output = reduce_scatter_sum(input)
# Rank 0 # output: Tensor([0])
# Rank 1 # output: Tensor([2])

input = Tensor([0 1])
group = Group([1, 0])
output = reduce_scatter_sum(input, group)
# Rank 0 # output: Tensor([2])
# Rank 1 # output: Tensor([0])