megengine.distributed#

>>> import megengine.distributed as dist

Get or set backend of collective communication.

分组（Group）#

`Server`	Distributed Server for distributed training.
`Group`	Include ranked nodes running collective communication (See `distributed`).
`init_process_group`	Initialize the distributed process group and specify the device used in the current process
`new_group`	Build a subgroup containing certain ranks.
`group_barrier`	Block until all ranks in the group reach this barrier.
`override_backend`	Override distributed backend
`is_distributed`	Return True if the distributed process group has been initialized.
`get_backend`	Get the backend str.
`get_client`	Get client of python XML RPC server.
`get_mm_server_addr`	Get master_ip and port of C++ mm_server.
`get_py_server_addr`	Get master_ip and port of python XML RPC server.
`get_rank`	Get the rank of the current process.
`get_world_size`	Get the total number of processes participating in the job.

Decorator for launching multiple processes in single-machine multi-gpu training.

`bcast_list_`	Broadcast tensors between given group.
`synchronized`	Decorator.
`make_allreduce_cb`	alias of `AllreduceCallback`
`helper.AllreduceCallback`	Allreduce Callback with tensor fusion optimization.
`helper.param_pack_split`	Returns split tensor to list of tensors as offsets and shapes described, only used for `parampack`.
`helper.param_pack_concat`	Returns concated tensor, only used for `parampack`.
`helper.pack_allreduce_split`