Optimal Ridge Regularization for Out-of-Distribution Prediction

Pratik Patil, Jin-Hong Du, Ryan Tibshirani
Proceedings of the 41st International Conference on Machine Learning, PMLR 235:39908-39954, 2024.

Abstract

We study the behavior of optimal ridge regularization and optimal ridge risk for out-of-distribution prediction, where the test distribution deviates arbitrarily from the train distribution. We establish general conditions that determine the sign of the optimal regularization level under covariate and regression shifts. These conditions capture the alignment between the covariance and signal structures in the train and test data and reveal stark differences compared to the in-distribution setting. For example, a negative regularization level can be optimal under covariate shift or regression shift, even when the training features are isotropic or the design is underparameterized. Furthermore, we prove that the optimally tuned risk is monotonic in the data aspect ratio, even in the out-of-distribution setting and when optimizing over negative regularization levels. In general, our results do not make any modeling assumptions for the train or the test distributions, except for moment bounds, and allow for arbitrary shifts and the widest possible range of (negative) regularization levels.

Cite this Paper


BibTeX
@InProceedings{pmlr-v235-patil24a, title = {Optimal Ridge Regularization for Out-of-Distribution Prediction}, author = {Patil, Pratik and Du, Jin-Hong and Tibshirani, Ryan}, booktitle = {Proceedings of the 41st International Conference on Machine Learning}, pages = {39908--39954}, year = {2024}, editor = {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix}, volume = {235}, series = {Proceedings of Machine Learning Research}, month = {21--27 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v235/main/assets/patil24a/patil24a.pdf}, url = {https://proceedings.mlr.press/v235/patil24a.html}, abstract = {We study the behavior of optimal ridge regularization and optimal ridge risk for out-of-distribution prediction, where the test distribution deviates arbitrarily from the train distribution. We establish general conditions that determine the sign of the optimal regularization level under covariate and regression shifts. These conditions capture the alignment between the covariance and signal structures in the train and test data and reveal stark differences compared to the in-distribution setting. For example, a negative regularization level can be optimal under covariate shift or regression shift, even when the training features are isotropic or the design is underparameterized. Furthermore, we prove that the optimally tuned risk is monotonic in the data aspect ratio, even in the out-of-distribution setting and when optimizing over negative regularization levels. In general, our results do not make any modeling assumptions for the train or the test distributions, except for moment bounds, and allow for arbitrary shifts and the widest possible range of (negative) regularization levels.} }
Endnote
%0 Conference Paper %T Optimal Ridge Regularization for Out-of-Distribution Prediction %A Pratik Patil %A Jin-Hong Du %A Ryan Tibshirani %B Proceedings of the 41st International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2024 %E Ruslan Salakhutdinov %E Zico Kolter %E Katherine Heller %E Adrian Weller %E Nuria Oliver %E Jonathan Scarlett %E Felix Berkenkamp %F pmlr-v235-patil24a %I PMLR %P 39908--39954 %U https://proceedings.mlr.press/v235/patil24a.html %V 235 %X We study the behavior of optimal ridge regularization and optimal ridge risk for out-of-distribution prediction, where the test distribution deviates arbitrarily from the train distribution. We establish general conditions that determine the sign of the optimal regularization level under covariate and regression shifts. These conditions capture the alignment between the covariance and signal structures in the train and test data and reveal stark differences compared to the in-distribution setting. For example, a negative regularization level can be optimal under covariate shift or regression shift, even when the training features are isotropic or the design is underparameterized. Furthermore, we prove that the optimally tuned risk is monotonic in the data aspect ratio, even in the out-of-distribution setting and when optimizing over negative regularization levels. In general, our results do not make any modeling assumptions for the train or the test distributions, except for moment bounds, and allow for arbitrary shifts and the widest possible range of (negative) regularization levels.
APA
Patil, P., Du, J. & Tibshirani, R.. (2024). Optimal Ridge Regularization for Out-of-Distribution Prediction. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:39908-39954 Available from https://proceedings.mlr.press/v235/patil24a.html.

Related Material