AdamW#
- class AdamW(params, lr, betas=(0.9, 0.999), eps=1e-8, weight_decay=1e-2)[源代码]#
Implements AdamW algorithm proposed in “Decoupled Weight Decay Regularization”.
- 参数:
params (
Union
[Iterable
[Parameter
],dict
]) – iterable of parameters to optimize or dicts defining parameter groups.lr (
float
) – learning rate. betas: coefficients used for computing running averages of gradient and its square. Default: (0.9, 0.999)eps (
float
) – term added to the denominator to improve numerical stability. Default: 1e-8weight_decay (
float
) – weight decay (L2 penalty). Default: 1e-2