No Double Descent in Principal Component Regression: A High-Dimensional Analysis

Daniel Gedon, Antonio H. Ribeiro, Thomas B. Schön
Proceedings of the 41st International Conference on Machine Learning, PMLR 235:15271-15293, 2024.

Abstract

Understanding the generalization properties of large-scale models necessitates incorporating realistic data assumptions into the analysis. Therefore, we consider Principal Component Regression (PCR)—combining principal component analysis and linear regression—on data from a low-dimensional manifold. We present an analysis of PCR when the data is sampled from a spiked covariance model, obtaining fundamental asymptotic guarantees for the generalization risk of this model. Our analysis is based on random matrix theory and allows us to provide guarantees for high-dimensional data. We additionally present an analysis of the distribution shift between training and test data. The results allow us to disentangle the effects of (1) the number of parameters, (2) the data-generating model and, (3) model misspecification on the generalization risk. The use of PCR effectively regularizes the model and prevents the interpolation peak of the double descent. Our theoretical findings are empirically validated in simulation, demonstrating their practical relevance.

Cite this Paper


BibTeX
@InProceedings{pmlr-v235-gedon24a, title = {No Double Descent in Principal Component Regression: A High-Dimensional Analysis}, author = {Gedon, Daniel and Ribeiro, Antonio H. and Sch\"{o}n, Thomas B.}, booktitle = {Proceedings of the 41st International Conference on Machine Learning}, pages = {15271--15293}, year = {2024}, editor = {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix}, volume = {235}, series = {Proceedings of Machine Learning Research}, month = {21--27 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v235/main/assets/gedon24a/gedon24a.pdf}, url = {https://proceedings.mlr.press/v235/gedon24a.html}, abstract = {Understanding the generalization properties of large-scale models necessitates incorporating realistic data assumptions into the analysis. Therefore, we consider Principal Component Regression (PCR)—combining principal component analysis and linear regression—on data from a low-dimensional manifold. We present an analysis of PCR when the data is sampled from a spiked covariance model, obtaining fundamental asymptotic guarantees for the generalization risk of this model. Our analysis is based on random matrix theory and allows us to provide guarantees for high-dimensional data. We additionally present an analysis of the distribution shift between training and test data. The results allow us to disentangle the effects of (1) the number of parameters, (2) the data-generating model and, (3) model misspecification on the generalization risk. The use of PCR effectively regularizes the model and prevents the interpolation peak of the double descent. Our theoretical findings are empirically validated in simulation, demonstrating their practical relevance.} }
Endnote
%0 Conference Paper %T No Double Descent in Principal Component Regression: A High-Dimensional Analysis %A Daniel Gedon %A Antonio H. Ribeiro %A Thomas B. Schön %B Proceedings of the 41st International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2024 %E Ruslan Salakhutdinov %E Zico Kolter %E Katherine Heller %E Adrian Weller %E Nuria Oliver %E Jonathan Scarlett %E Felix Berkenkamp %F pmlr-v235-gedon24a %I PMLR %P 15271--15293 %U https://proceedings.mlr.press/v235/gedon24a.html %V 235 %X Understanding the generalization properties of large-scale models necessitates incorporating realistic data assumptions into the analysis. Therefore, we consider Principal Component Regression (PCR)—combining principal component analysis and linear regression—on data from a low-dimensional manifold. We present an analysis of PCR when the data is sampled from a spiked covariance model, obtaining fundamental asymptotic guarantees for the generalization risk of this model. Our analysis is based on random matrix theory and allows us to provide guarantees for high-dimensional data. We additionally present an analysis of the distribution shift between training and test data. The results allow us to disentangle the effects of (1) the number of parameters, (2) the data-generating model and, (3) model misspecification on the generalization risk. The use of PCR effectively regularizes the model and prevents the interpolation peak of the double descent. Our theoretical findings are empirically validated in simulation, demonstrating their practical relevance.
APA
Gedon, D., Ribeiro, A.H. & Schön, T.B.. (2024). No Double Descent in Principal Component Regression: A High-Dimensional Analysis. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:15271-15293 Available from https://proceedings.mlr.press/v235/gedon24a.html.

Related Material