AutoDrop: Training Deep Learning Models with Automatic Learning Rate Drop

Jing Wang; Yunfei Teng; Anna Choromanska

AutoDrop: Training Deep Learning Models with Automatic Learning Rate Drop

Jing Wang, Yunfei Teng, Anna Choromanska

Proceedings of the Fortieth Conference on Uncertainty in Artificial Intelligence, PMLR 244:3603-3629, 2024.

Abstract

Modern deep learning (DL) architectures are trained using variants of the SGD algorithm and typically rely on the user to manually drop the learning rate when the training curve saturates. In this paper, we develop an algorithm, that we call AutoDrop, that realizes the learning rate drop automatically and stems from the properties of the learning dynamics of DL systems. Specifically, it is motivated by the observation that the angular velocity of the model parameters, i.e., the velocity of the changes of the convergence direction, for a fixed learning rate initially increases rapidly and then progresses towards soft saturation. At saturation, the optimizer slows down thus the angular velocity saturation is a good indicator for dropping the learning rate. After the drop, the angular velocity {“}resets{”} and follows the pattern described above, increasing again until saturation. AutoDrop is built on this idea and drops the learning rate whenever the angular velocity saturates. The method is simple to implement, computationally cheap, and by design avoids the short-horizon bias problem. We show that AutoDrop achieves favorable performance compared to many different baseline manual and automatic learning rate schedulers, and matches the SOTA performance on all our experiments. On the theoretical front, we claim two contributions: we formulate the learning rate behavior based on the angular velocity and provide general convergence theory for the learning rate schedulers that decrease the learning rate step-wise, rather than continuously as is commonly analyzed.

Cite this Paper

BibTeX

@InProceedings{pmlr-v244-wang24e,
  title = 	 {AutoDrop: Training Deep Learning Models with Automatic Learning Rate Drop},
  author =       {Wang, Jing and Teng, Yunfei and Choromanska, Anna},
  booktitle = 	 {Proceedings of the Fortieth Conference on Uncertainty in Artificial Intelligence},
  pages = 	 {3603--3629},
  year = 	 {2024},
  editor = 	 {Kiyavash, Negar and Mooij, Joris M.},
  volume = 	 {244},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {15--19 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v244/main/assets/wang24e/wang24e.pdf},
  url = 	 {https://proceedings.mlr.press/v244/wang24e.html},
  abstract = 	 {Modern deep learning (DL) architectures are trained using variants of the SGD algorithm and typically rely on the user to manually drop the learning rate when the training curve saturates. In this paper, we develop an algorithm, that we call AutoDrop, that realizes the learning rate drop automatically and stems from the properties of the learning dynamics of DL systems. Specifically, it is motivated by the observation that the angular velocity of the model parameters, i.e., the velocity of the changes of the convergence direction, for a fixed learning rate initially increases rapidly and then progresses towards soft saturation. At saturation, the optimizer slows down thus the angular velocity saturation is a good indicator for dropping the learning rate. After the drop, the angular velocity {“}resets{”} and follows the pattern described above, increasing again until saturation. AutoDrop is built on this idea and drops the learning rate whenever the angular velocity saturates. The method is simple to implement, computationally cheap, and by design avoids the short-horizon bias problem. We show that AutoDrop achieves favorable performance compared to many different baseline manual and automatic learning rate schedulers, and matches the SOTA performance on all our experiments. On the theoretical front, we claim two contributions: we formulate the learning rate behavior based on the angular velocity and provide general convergence theory for the learning rate schedulers that decrease the learning rate step-wise, rather than continuously as is commonly analyzed.}
}

Endnote

%0 Conference Paper
%T AutoDrop: Training Deep Learning Models with Automatic Learning Rate Drop
%A Jing Wang
%A Yunfei Teng
%A Anna Choromanska
%B Proceedings of the Fortieth Conference on Uncertainty in Artificial Intelligence
%C Proceedings of Machine Learning Research
%D 2024
%E Negar Kiyavash
%E Joris M. Mooij	
%F pmlr-v244-wang24e
%I PMLR
%P 3603--3629
%U https://proceedings.mlr.press/v244/wang24e.html
%V 244
%X Modern deep learning (DL) architectures are trained using variants of the SGD algorithm and typically rely on the user to manually drop the learning rate when the training curve saturates. In this paper, we develop an algorithm, that we call AutoDrop, that realizes the learning rate drop automatically and stems from the properties of the learning dynamics of DL systems. Specifically, it is motivated by the observation that the angular velocity of the model parameters, i.e., the velocity of the changes of the convergence direction, for a fixed learning rate initially increases rapidly and then progresses towards soft saturation. At saturation, the optimizer slows down thus the angular velocity saturation is a good indicator for dropping the learning rate. After the drop, the angular velocity {“}resets{”} and follows the pattern described above, increasing again until saturation. AutoDrop is built on this idea and drops the learning rate whenever the angular velocity saturates. The method is simple to implement, computationally cheap, and by design avoids the short-horizon bias problem. We show that AutoDrop achieves favorable performance compared to many different baseline manual and automatic learning rate schedulers, and matches the SOTA performance on all our experiments. On the theoretical front, we claim two contributions: we formulate the learning rate behavior based on the angular velocity and provide general convergence theory for the learning rate schedulers that decrease the learning rate step-wise, rather than continuously as is commonly analyzed.

APA

Wang, J., Teng, Y. & Choromanska, A.. (2024). AutoDrop: Training Deep Learning Models with Automatic Learning Rate Drop. Proceedings of the Fortieth Conference on Uncertainty in Artificial Intelligence, in Proceedings of Machine Learning Research 244:3603-3629 Available from https://proceedings.mlr.press/v244/wang24e.html.

AutoDrop: Training Deep Learning Models with Automatic Learning Rate Drop

Abstract

Cite this Paper

Related Material