Optimizers

Optimizers#

ADOPT(learning_rate, ~mlx.core.array], ...)

ADaptive gradient method with the OPTimal convergence rate [1].

DiffGrad(learning_rate[, betas, ...])

Difference of Gradients [1].

Muon(learning_rate, ~mlx.core.array] = 0.02, ...)

MomentUm Orthogonalized by Newton-schulz [1].

MARS([learning_rate, betas, eps, ...])

Make vAriance Reduction Shine [1].

QHAdam(learning_rate[, betas, nus, ...])

Quasi-Hyperbolic Adaptive Moment Estimation [1].

MADGRAD(learning_rate[, momentum, ...])

Momentumized, Adaptive, Dual averaged GRADient [1].

Lamb(learning_rate[, betas, weight_decay, eps])

Layerwise Adaptive Large Batch Optimization [1].

Kron(learning_rate[, b1, weight_decay, ...])

Kronecker-Factored Preconditioned Stochastic Gradient Descent [1].

Shampoo(learning_rate[, momentum, ...])

Preconditioned Stochastic Tensor Optimization (general tensor case) [1].