Conv2d

class Conv2d(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True, conv_mode='cross_correlation', compute_mode='default', padding_mode='zeros', **kwargs)[源代码]

对输入张量进行二维卷积

例如, 给定一个大小为 \((N, C_{\text{in}}, H, W)\) 的输入, 该层通过下述过程产生一个大小为 \((N, C_{\text{out}}, H_{\text{out}}, W_{\text{out}})\) 的输出。

\[\text{out}(N_i, C_{\text{out}_j}) = \text{bias}(C_{\text{out}_j}) + \sum_{k = 0}^{C_{\text{in}} - 1} \text{weight}(C_{\text{out}_j}, k) \star \text{input}(N_i, k)\]

其中 \(\star\) 是有效的2D互相关运算; \(N\) 是批大小; \(C\) 表示通道数; \(H\) 是以像素为单位输入平面的高度; \(W\) 是以像素为单位的平面宽度。

通常,输出的特征图的形状可以被下面的方式推导出来:

input: \((N, C_{\text{in}}, H_{\text{in}}, W_{\text{in}})\)

output: \((N, C_{\text{out}}, H_{\text{out}}, W_{\text{out}})\) 在此式中

\[\text{H}_{out} = \lfloor \frac{\text{H}_{in} + 2 * \text{padding[0]} - \text{dilation[0]} * (\text{kernel_size[0]} - 1) - 1}{\text{stride[0]}} + 1 \rfloor\]
\[\text{W}_{out} = \lfloor \frac{\text{W}_{in} + 2 * \text{padding[1]} - \text{dilation[1]} * (\text{kernel_size[1]} - 1) - 1}{\text{stride[1]}} + 1 \rfloor\]

groups == in_channelsout_channels == K * in_channels ,其中 K 是正整数,该操作也被称为深度方向卷积(depthwise convolution)。

In other words, for an input of size \((N, C_{\text{in}}, H_{\text{in}}, W_{\text{in}})\), a depthwise convolution with a depthwise multiplier K, can be constructed by arguments \((in\_channels=C_{\text{in}}, out\_channels=C_{\text{in}} \times K, ..., groups=C_{\text{in}})\).

参数
  • in_channels (int) – 输入数据中的通道数。

  • out_channels (int) – 输出数据中的通道数。

  • kernel_size (Union[int, Tuple[int, int]]) – 空间维度上的权重大小。如果kernel_size 是一个 int, 实际的kernel大小为 (kernel_size, kernel_size).

  • stride (Union[int, Tuple[int, int]]) – stride of the 2D convolution operation. Default: 1.

  • padding (Union[int, Tuple[int, int]]) – size of the paddings added to the input on both sides of its spatial dimensions. Default: 0.

  • dilation (Union[int, Tuple[int, int]]) – dilation of the 2D convolution operation. Default: 1.

  • groups (int) – number of groups into which the input and output channels are divided, so as to perform a grouped convolution. When groups is not 1, in_channels and out_channels must be divisible by groups, and the shape of weight should be (groups, out_channel // groups, in_channels // groups, height, width). Default: 1.

  • bias (bool) – whether to add a bias onto the result of convolution. Default: True.

  • conv_mode (str) – supports cross_correlation. Default: cross_correlation.

  • compute_mode (str) – when set to “default”, no special requirements will be placed on the precision of intermediate results. When set to “float32”, “float32” would be used for accumulator and intermediate result, but only effective when input and output are of float16 dtype. Default: default.

  • padding_mode (str) – “zeros”, “reflect” 或者 “replicate”。默认值:”zeros”。更多信息参考 Pad

Shape:

input: \((N, C_{\text{in}}, H_{\text{in}}, W_{\text{in}})\). output: \((N, C_{\text{out}}, H_{\text{out}}, W_{\text{out}})\).

注解

  • weight 的shape通常为 (out_channels, in_channels, height, width) ,

    如果 groups 不为 1, shape 应该是 (groups, out_channels // groups, in_channels // groups, height, width)

  • bias 的shape通常为 (1, out_channels, *1)

返回

module. The instance of the Conv2d module.

返回类型

Return type

实际案例

>>> import numpy as np
>>> m = M.Conv2d(in_channels=3, out_channels=1, kernel_size=3)
>>> inp = mge.tensor(np.arange(0, 96).astype("float32").reshape(2, 3, 4, 4))
>>> oup = m(inp)
>>> oup.numpy().shape
(2, 1, 2, 2)