Learning Rate Schedules in the Presence of Distribution Shift

Matthew Fahrbach, Adel Javanmard, Vahab Mirrokni, Pratik Worah
Proceedings of the 40th International Conference on Machine Learning, PMLR 202:9523-9546, 2023.

Abstract

We design learning rate schedules that minimize regret for SGD-based online learning in the presence of a changing data distribution. We fully characterize the optimal learning rate schedule for online linear regression via a novel analysis with stochastic differential equations. For general convex loss functions, we propose new learning rate schedules that are robust to distribution shift, and give upper and lower bounds for the regret that only differ by constants. For non-convex loss functions, we define a notion of regret based on the gradient norm of the estimated models and propose a learning schedule that minimizes an upper bound on the total expected regret. Intuitively, one expects changing loss landscapes to require more exploration, and we confirm that optimal learning rate schedules typically have higher learning rates in the presence of distribution shift. Finally, we provide experiments that illustrate these learning rate schedules and their regret.

Cite this Paper


BibTeX
@InProceedings{pmlr-v202-fahrbach23a, title = {Learning Rate Schedules in the Presence of Distribution Shift}, author = {Fahrbach, Matthew and Javanmard, Adel and Mirrokni, Vahab and Worah, Pratik}, booktitle = {Proceedings of the 40th International Conference on Machine Learning}, pages = {9523--9546}, year = {2023}, editor = {Krause, Andreas and Brunskill, Emma and Cho, Kyunghyun and Engelhardt, Barbara and Sabato, Sivan and Scarlett, Jonathan}, volume = {202}, series = {Proceedings of Machine Learning Research}, month = {23--29 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v202/fahrbach23a/fahrbach23a.pdf}, url = {https://proceedings.mlr.press/v202/fahrbach23a.html}, abstract = {We design learning rate schedules that minimize regret for SGD-based online learning in the presence of a changing data distribution. We fully characterize the optimal learning rate schedule for online linear regression via a novel analysis with stochastic differential equations. For general convex loss functions, we propose new learning rate schedules that are robust to distribution shift, and give upper and lower bounds for the regret that only differ by constants. For non-convex loss functions, we define a notion of regret based on the gradient norm of the estimated models and propose a learning schedule that minimizes an upper bound on the total expected regret. Intuitively, one expects changing loss landscapes to require more exploration, and we confirm that optimal learning rate schedules typically have higher learning rates in the presence of distribution shift. Finally, we provide experiments that illustrate these learning rate schedules and their regret.} }
Endnote
%0 Conference Paper %T Learning Rate Schedules in the Presence of Distribution Shift %A Matthew Fahrbach %A Adel Javanmard %A Vahab Mirrokni %A Pratik Worah %B Proceedings of the 40th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2023 %E Andreas Krause %E Emma Brunskill %E Kyunghyun Cho %E Barbara Engelhardt %E Sivan Sabato %E Jonathan Scarlett %F pmlr-v202-fahrbach23a %I PMLR %P 9523--9546 %U https://proceedings.mlr.press/v202/fahrbach23a.html %V 202 %X We design learning rate schedules that minimize regret for SGD-based online learning in the presence of a changing data distribution. We fully characterize the optimal learning rate schedule for online linear regression via a novel analysis with stochastic differential equations. For general convex loss functions, we propose new learning rate schedules that are robust to distribution shift, and give upper and lower bounds for the regret that only differ by constants. For non-convex loss functions, we define a notion of regret based on the gradient norm of the estimated models and propose a learning schedule that minimizes an upper bound on the total expected regret. Intuitively, one expects changing loss landscapes to require more exploration, and we confirm that optimal learning rate schedules typically have higher learning rates in the presence of distribution shift. Finally, we provide experiments that illustrate these learning rate schedules and their regret.
APA
Fahrbach, M., Javanmard, A., Mirrokni, V. & Worah, P.. (2023). Learning Rate Schedules in the Presence of Distribution Shift. Proceedings of the 40th International Conference on Machine Learning, in Proceedings of Machine Learning Research 202:9523-9546 Available from https://proceedings.mlr.press/v202/fahrbach23a.html.

Related Material