GradScaler¶
- class GradScaler(init_scale=2.0 ** 4, growth_factor=2.0, backoff_factor=0.5, growth_interval=2000)[源代码]¶
 A helper class that performs grad scaling to prevent from data overflow in
autocastmode.- 参数
 init_scale (
float) – Initial scale factor.growth_factor (
float) – Factor that the scale is multiplied by in actualupdatestage. If growth_factor is 0, scale_factor will not update.backoff_factor (
float) – Factor that the scale is multiplied by when encountering overflow grad.growth_interval (
int) – The interval between two scale update stages.
示例
gm = GradManager() opt = ... scaler = GradScaler() gm.attach(model.parameters()) @autocast() def train_step(image, label): with gm: logits = model(image) loss = F.nn.cross_entropy(logits, label) scaler.backward(gm, loss) opt.step().clear_grad() return loss
If need more flexible usage, could split
scaler.backwardinto three lines:@autocast() def train_step(image, label): with gm: logits = model(image) loss = F.nn.cross_entropy(logits, label) gm.backward(loss, dy=megengine.tensor(scaler.scale_factor)) scaler.unscale(gm.attached_tensors()) scaler.update() opt.step().clear_grad() return lossThis is useful when need to accumulate grads for multi batches.
- backward(gm, y=None, dy=None, *, unscale_grad=True, update_scale='if_unscale_grad')[源代码]¶
 A wrapper of GradManager’s
backward, used to scaley’s grad and unscale parameters’ grads.- 参数
 gm (
GradManager) – The to be wrapped GradManager.y (
Union[Tensor,List[Tensor],None]) – Same as GradManager backward’sy.dy (
Union[Tensor,List[Tensor],None]) – Same as GradManager backward’sdy. Will be multiplied byscale_factor.unscale_grad (
bool) – Whether dounscaleat the same time. Could beFalseif needs to accumulate grads.update_scale (
bool) – Same asunscale’supdate. Will be ignored ifunscale_gradisFalse.