BatchNorm2d#

class BatchNorm2d(num_features, eps=1e-5, momentum=0.9, affine=True, track_running_stats=True, freeze=False, **kwargs)[source]#

Applies Batch Normalization over a 4D tensor.

\[y = \frac{x - \mathrm{E}[x]}{ \sqrt{\mathrm{Var}[x] + \epsilon}} * \gamma + \beta\]

The mean and standard-deviation are calculated per-dimension over the mini-batches and \(\gamma\) and \(\beta\) are learnable parameter vectors.

By default, during training this layer keeps running estimates of its computed mean and variance, which are then used for normalization during evaluation. The running estimates are kept with a default momentum of 0.9.

If track_running_stats is set to False, this layer will not keep running estimates, batch statistics is used during evaluation time instead.

Because the Batch Normalization is done over the C dimension, computing statistics on (N, H, W) slices, it’s common terminology to call this Spatial Batch Normalization.

Note

The update formula for running_mean and running_var (taking running_mean as an example) is

\[\textrm{running_mean} = \textrm{momentum} \times \textrm{running_mean} + (1 - \textrm{momentum}) \times \textrm{batch_mean}\]

which could be defined differently in other frameworks. Most notably, momentum of 0.1 in PyTorch is equivalent to mementum of 0.9 here.

Parameters:

num_features – usually \(C\) from an input of shape \((N, C, H, W)\) or the highest ranked dimension of an input less than 4D.
eps – a value added to the denominator for numerical stability. Default: 1e-5
momentum – the value used for the running_mean and running_var computation. Default: 0.9
affine – a boolean value that when set to True, this module has learnable affine parameters. Default: True
track_running_stats – when set to True, this module tracks the running mean and variance. When set to False, this module does not track such statistics and always uses batch statistics in both training and eval modes. Default: True
freeze – when set to True, this module does not update the running mean and variance, and uses the running mean and variance instead of the batch mean and batch variance to normalize the input. The parameter takes effect only when the module is initilized with track_running_stats as True. Default: False

Examples

>>> import numpy as np
>>> # With Learnable Parameters
>>> m = M.BatchNorm2d(4)
>>> inp = mge.tensor(np.random.rand(1, 4, 3, 3).astype("float32"))
>>> oup = m(inp)
>>> print(m.weight.numpy().flatten(), m.bias.numpy().flatten())
[1. 1. 1. 1.] [0. 0. 0. 0.]
>>> # Without Learnable Parameters
>>> m = M.BatchNorm2d(4, affine=False)
>>> oup = m(inp)
>>> print(m.weight, m.bias)
None None