megengine.functional.nn.roi_align#

roi_align(inp, rois, output_shape, mode='average', spatial_scale=1.0, sample_points=2, aligned=True)[source]#

Applies RoI (Region of Interest) align on input feature, as described in Mask R-CNN.

See also

Parameters:
  • inp (Tensor) – the input tensor that represents the input feature with (n, c, h, w) shape.

  • rois (Tensor) – a tensor represents Regions of Interest with shape (K, 5), which means total K box coordinates in (idx, x1, y1, x2, y2) format where the regions will be taken from. The coordinate including (x1, y1) and (x2, y2) must satisfy 0 <= x1 < x2 and 0 <= y1 < y2. The first column idx should contain the index of the corresponding element in the input batch, i.e. a number in [0, n - 1].

  • output_shape (Union[int, tuple, list]) – (height, width) shape of output rois feature.

  • mode (str) – “max” or “average”, use max/average align just like max/average pooling. Default: “average”

  • spatial_scale (float) – scale the input boxes by this number. Default: 1.0

  • sample_points (Union[int, tuple, list]) – number of inputs samples to take for each output sample. 0 to take samples densely. Default: 2

  • aligned (bool) – wheather to align the input feature, with aligned=True, we first appropriately scale the ROI and then shift it by -0.5. Default: True

Return type:

Tensor

Returns:

output tensor.

Examples

>>> import numpy as np
>>> np.random.seed(42)
>>> inp = Tensor(np.random.randn(1, 1, 128, 128))
>>> rois = Tensor(np.random.random((4, 5)))
>>> y = F.vision.roi_align(inp, rois, (2, 2))
>>> y.numpy()[0].round(decimals=4)
array([[[0.175 , 0.175 ],
        [0.1359, 0.1359]]], dtype=float32)