LAMB¶
- class LAMB(params, lr, betas=(0.9, 0.999), eps=1e-08, bias_correction=True, weight_decay=0.0, always_adapt=False)[源代码]¶
 Implements LAMB algorithm.
LAMB is proposed in “Large Batch Optimization for Deep Learning: Training BERT in 76 minutes”.
- 参数
 params (
Union[Iterable[Parameter],dict]) – 可迭代对象,可以是一组待优化的参数,或定义几组参数的dict类型。lr (
float) – 学习率(learning rate)。betas (
Tuple[float,float]) – coefficients used for computing running averages of gradient and its square. Default:(0.9, 0.999)eps (
float) – term added to the denominator to improve numerical stability. Default:1e-8bias_correction (
bool) – enables bias correction by1 - beta ** step. Default:Trueweight_decay (
float) – weight decay (L2 penalty). Default:0.0always_adapt (
bool) – apply adaptive lr to0.0weight decay parameter. Default:False