An Adam-like optimizer for neural networks with adaptive estimation of learning rate
Let net be the neural network you want to train. Then, you can use the method as follows:
from prodigyopt import Prodigy
# you can choose weight decay value based on your problem, 0 by default
opt = Prodigy(net.parameters(), lr=1., weight_decay=weight_decay)
Note that by default, Prodigy uses weight decay as in AdamW. If you want it to use standard regularization (as in Adam), use option decouple=False. We recommend using lr=1. (default) for all networks. If you want to force the method to estimate a smaller or larger learning rate, it is better to change the value of dcoef (1.0 by default). Values of dcoef above 1, such as 2 or 10, will force a larger estimate of the learning rate; set it to 0.5 or even 0.1 if you want a smaller learning rate. If you find our work useful, please consider citing our paper.
@article{mishchenko2023prodigy,
title={Prodigy: An Expeditiously Adaptive Parameter-Free Learner},
author={Mishchenko, Konstantin and Defazio, Aaron},
journal={arXiv preprint arXiv:2306.06101},
year={2023},
url={https://arxiv.org/pdf/2306.06101.pdf}
}