Failures and Successes of Cross-Validation for Early-Stopped Gradient Descent

Pratik Patil, Yuchen Wu, Ryan Tibshirani
Proceedings of The 27th International Conference on Artificial Intelligence and Statistics, PMLR 238:2260-2268, 2024.

Abstract

We analyze the statistical properties of generalized cross-validation (GCV) and leave-one-out cross-validation (LOOCV) applied to early-stopped gradient descent (GD) in high-dimensional least squares regression. We prove that GCV is generically inconsistent as an estimator of the prediction risk of early-stopped GD, even for a well-specified linear model with isotropic features. In contrast, we show that LOOCV converges uniformly along the GD trajectory to the prediction risk. Our theory requires only mild assumptions on the data distribution and does not require the underlying regression function to be linear. Furthermore, by leveraging the individual LOOCV errors, we construct consistent estimators for the entire prediction error distribution along the GD trajectory and consistent estimators for a wide class of error functionals. This in particular enables the construction of pathwise prediction intervals based on GD iterates that have asymptotically correct nominal coverage conditional on the training data.

Cite this Paper


BibTeX
@InProceedings{pmlr-v238-patil24a, title = {Failures and Successes of Cross-Validation for Early-Stopped Gradient Descent}, author = {Patil, Pratik and Wu, Yuchen and Tibshirani, Ryan}, booktitle = {Proceedings of The 27th International Conference on Artificial Intelligence and Statistics}, pages = {2260--2268}, year = {2024}, editor = {Dasgupta, Sanjoy and Mandt, Stephan and Li, Yingzhen}, volume = {238}, series = {Proceedings of Machine Learning Research}, month = {02--04 May}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v238/patil24a/patil24a.pdf}, url = {https://proceedings.mlr.press/v238/patil24a.html}, abstract = {We analyze the statistical properties of generalized cross-validation (GCV) and leave-one-out cross-validation (LOOCV) applied to early-stopped gradient descent (GD) in high-dimensional least squares regression. We prove that GCV is generically inconsistent as an estimator of the prediction risk of early-stopped GD, even for a well-specified linear model with isotropic features. In contrast, we show that LOOCV converges uniformly along the GD trajectory to the prediction risk. Our theory requires only mild assumptions on the data distribution and does not require the underlying regression function to be linear. Furthermore, by leveraging the individual LOOCV errors, we construct consistent estimators for the entire prediction error distribution along the GD trajectory and consistent estimators for a wide class of error functionals. This in particular enables the construction of pathwise prediction intervals based on GD iterates that have asymptotically correct nominal coverage conditional on the training data.} }
Endnote
%0 Conference Paper %T Failures and Successes of Cross-Validation for Early-Stopped Gradient Descent %A Pratik Patil %A Yuchen Wu %A Ryan Tibshirani %B Proceedings of The 27th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2024 %E Sanjoy Dasgupta %E Stephan Mandt %E Yingzhen Li %F pmlr-v238-patil24a %I PMLR %P 2260--2268 %U https://proceedings.mlr.press/v238/patil24a.html %V 238 %X We analyze the statistical properties of generalized cross-validation (GCV) and leave-one-out cross-validation (LOOCV) applied to early-stopped gradient descent (GD) in high-dimensional least squares regression. We prove that GCV is generically inconsistent as an estimator of the prediction risk of early-stopped GD, even for a well-specified linear model with isotropic features. In contrast, we show that LOOCV converges uniformly along the GD trajectory to the prediction risk. Our theory requires only mild assumptions on the data distribution and does not require the underlying regression function to be linear. Furthermore, by leveraging the individual LOOCV errors, we construct consistent estimators for the entire prediction error distribution along the GD trajectory and consistent estimators for a wide class of error functionals. This in particular enables the construction of pathwise prediction intervals based on GD iterates that have asymptotically correct nominal coverage conditional on the training data.
APA
Patil, P., Wu, Y. & Tibshirani, R.. (2024). Failures and Successes of Cross-Validation for Early-Stopped Gradient Descent. Proceedings of The 27th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 238:2260-2268 Available from https://proceedings.mlr.press/v238/patil24a.html.

Related Material