SGD¶

class SGD(params, lr, momentum=0.0, nesterov=False, weight_decay=0.0)[源代码]¶

实现随机梯度下降。

This optimizer performs stochastic gradient descent with optional momentum and weight decay.

Nesterov momentum is based on the formula from “On the importance of initialization and momentum in deep learning”.

参数

params (Union[Iterable[Parameter], dict]) – Iterable of parameters to optimize or dicts defining parameter groups.
lr (float) – Learning rate.
momentum (float) – Momentum factor. Default: 0.0.
nesterov (bool) – Enables Nesterov momentum. Default: False.
weight_decay (float) – Weight decay (L2 penalty). Default: 0.0.

返回

An instance of the SGD optimizer.

注解

This optimizer does not guarantee that the interval does not include the stop value in cases where the step is not an integer and floating-point rounding errors affect the length of the output tensor.

Optimizer

AdamW