SGD#

class SGD(params, lr, momentum=0.0, nesterov=False, weight_decay=0.0)[source]#

Implements stochastic gradient descent.

Nesterov momentum is based on the formula from “On the importance of initialization and momentum in deep learning” .

Parameters:
  • params (Union[Iterable[Parameter], dict]) – iterable of parameters to optimize or dicts defining parameter groups.

  • lr (float) – learning rate.

  • momentum (float) – momentum factor. Default: 0.0

  • nesterov (bool) – enables Nesterov momentum. Default: False

  • weight_decay (float) – weight decay (L2 penalty). Default: 0.0