Optimizers#
|
ADaptive gradient method with the OPTimal convergence rate [1]. |
|
Difference of Gradients [1]. |
|
MomentUm Orthogonalized by Newton-schulz [1]. |
|
Make vAriance Reduction Shine [1]. |
|
Quasi-Hyperbolic Adaptive Moment Estimation [1]. |
|
Momentumized, Adaptive, Dual averaged GRADient [1]. |
|
Layerwise Adaptive Large Batch Optimization [1]. |
|
Kronecker-Factored Preconditioned Stochastic Gradient Descent [1]. |
|
Preconditioned Stochastic Tensor Optimization (general tensor case) [1]. |