megengine.functional.distributed.all_reduce_max#

all_reduce_max(inp, group=WORLD, device=None)[source]#

Reduce tensors with max operation on each value across the specified group.

Note

inp tensor must have identical shape in all processes across the group.

Parameters:

inp (Tensor) – tensor to be reduced.

Keyword Arguments:
  • group (Group or sequence of ints) – the process group to work on. Default: WORLD. WORLD group selects all processes available. list of process rank as parameter will create a new group to work on.

  • device (Tensor.device) – the specific device to execute this operator. Default: None None will select the device of inp to execute. Specially, GPU device can assign a different stream to execute by adding a number right after a colon following the device name while :0 denotes default stream of GPU, otherwise will use default stream.

Return type:

Tensor

Returns:

A tensor with max operation on each value across the group.

The shape of the output tensor must be the same as inp, and the output tensor is going to be bitwise identical in all processes across the group.

Examples

>>> # We execute all_reduce_max on rank 0 and rank 1
>>> input = F.arange(2) + 1 + 2 * rank 
>>> input  
Tensor([1. 2.], device=xpux:0) # Rank 0
Tensor([3. 4.], device=xpux:0) # Rank 1
>>> F.distributed.all_reduce_max(input, group=[0, 1]) 
Tensor([3. 4.], device=xpux:0) # Rank 0
Tensor([3. 4.], device=xpux:0) # Rank 1
>>> # We execute all_reduce_max with on gpu0 with cuda stream 1
>>> megengine.set_default_device("gpu0") 
>>> input = F.arange(2) + 1 + 2 * rank 
>>> input  
Tensor([1. 2.], device=gpu0:0) # Rank 0
Tensor([3. 4.], device=gpu0:0) # Rank 1
>>> F.distributed.all_reduce_max(input, device="gpu0:1") 
Tensor([3. 4.], device=xpux:0) # Rank 0
Tensor([3. 4.], device=xpux:0) # Rank 1