mlx_optimizers.QHAdam#

class QHAdam(learning_rate: float | Callable[[array], array], betas: List[float] = [0.9, 0.999], nus: List[float] = [1.0, 1.0], weight_decay: float = 0.0, decouple_weight_decay: bool = False, eps: float = 1e-08)#

Quasi-Hyperbolic Adaptive Moment Estimation [1].

\[\begin{split}g_{t+1} &= \beta_1 g_t + (1 - \beta_1) g_t \\ \theta_{t+1} &= \theta_t - \eta \left[ (1 - \nu) g_t + \nu g_{t+1}\right]\end{split}\]

[1] Ma, Jerry, and Denis Yarats, 2019. Quasi-hyperbolic momentum and Adam for deep learning. ICLR 2019. https://arxiv.org/abs/1810.06801 facebookresearch/qhoptim

Parameters:

learning_rate (float or callable) – learning rate \(\eta\).
betas (Tuple[float, float], optional) – coefficients \((\beta_1, \beta_2)\) used for computing running averages of the gradient and its square. Default: (0.9, 0.999)
nus (Tuple[float, float], optional) – immediate discount factors used to estimate the gradient and its square \((\nu_1, \nu_2)\). Default: (1.0, 1.0)
weight_decay – weight decay. Default: 0.0
decouple_weight_decay – whether to decouple weight decay from the optimization step. Default: False
eps – term added to the denominator to improve numerical stability. Default: 1e-8

Methods

`__init__`(learning_rate[, betas, nus, ...])
`apply_single`(gradient, parameter, state)	To be extended by derived classes to implement the optimizer's update.
`init_single`(parameter, state)	Initialize optimizer state

mlx_optimizers.QHAdam

Contents

mlx_optimizers.QHAdam#