GradScaler#

class GradScaler(init_scale=2.0 ** 4, growth_factor=2.0, backoff_factor=0.5, growth_interval=2000)[source]#

A helper class that performs grad scaling to prevent from data overflow in autocast mode.

Parameters:
  • init_scale (float) – Initial scale factor.

  • growth_factor (float) – Factor that the scale is multiplied by in actual update stage. If growth_factor is 0, scale_factor will not update.

  • backoff_factor (float) – Factor that the scale is multiplied by when encountering overflow grad.

  • growth_interval (int) – The interval between two scale update stages.

Example

gm = GradManager()
opt = ...
scaler = GradScaler()

gm.attach(model.parameters())

@autocast()
def train_step(image, label):
    with gm:
        logits = model(image)
        loss = F.nn.cross_entropy(logits, label)
        scaler.backward(gm, loss)
    opt.step().clear_grad()
    return loss

If need more flexible usage, could split scaler.backward into three lines:

@autocast()
def train_step(image, label):
    with gm:
        logits = model(image)
        loss = F.nn.cross_entropy(logits, label)
        gm.backward(loss, dy=megengine.tensor(scaler.scale_factor))
    scaler.unscale(gm.attached_tensors())
    scaler.update()
    opt.step().clear_grad()
    return loss

This is useful when need to accumulate grads for multi batches.

backward(gm, y=None, dy=None, *, unscale_grad=True, update_scale='if_unscale_grad')[source]#

A wrapper of GradManager’s backward, used to scale y’s grad and unscale parameters’ grads.

Parameters:
  • gm (GradManager) – The to be wrapped GradManager.

  • y (Union[Tensor, List[Tensor], None]) – Same as GradManager backward’s y.

  • dy (Union[Tensor, List[Tensor], None]) – Same as GradManager backward’s dy. Will be multiplied by scale_factor.

  • unscale_grad (bool) – Whether do unscale at the same time. Could be False if needs to accumulate grads.

  • update_scale (bool) – Same as unscale’s update. Will be ignored if unscale_grad is False.

unscale(grad_tensors)[source]#

Unscale all grad_tensors’s grad.

Parameters:

grad_tensors (Iterable[Tensor]) – Tensors needed to unscale grads. Should be all tensors that are affected by target tensor in GradManager’s backward.

update(new_scale=None)[source]#

Update the scale factor according to whether encountered overflow grad. If new_scale is provided, internal update mechanism will be ignored.