SGD#
- class SGD(params, lr, momentum=0.0, nesterov=False, weight_decay=0.0)[source]#
Implements stochastic gradient descent.
Nesterov momentum is based on the formula from “On the importance of initialization and momentum in deep learning” .
- Parameters:
params (
Union
[Iterable
[Parameter
],dict
]) – iterable of parameters to optimize or dicts defining parameter groups.lr (
float
) – learning rate.momentum (
float
) – momentum factor. Default: 0.0nesterov (
bool
) – enables Nesterov momentum. Default: Falseweight_decay (
float
) – weight decay (L2 penalty). Default: 0.0