Universality of High-Dimensional Logistic Regression and a Novel CGMT under Dependence with Applications to Data Augmentation

Matthew Esmaili Mallory, Kevin Han Huang, Morgane Austern
Proceedings of Thirty Eighth Conference on Learning Theory, PMLR 291:1799-1918, 2025.

Abstract

Over the last decade, a wave of research has characterized the exact asymptotic risk of many high-dimensional models in the proportional regime. Two foundational results have driven this progress: Gaussian universality, which shows that the asymptotic risk of estimators trained on non-Gaussian and Gaussian data is equivalent, and the convex Gaussian min-max theorem (CGMT), which characterizes the risk under Gaussian settings. However, these results rely on the assumption that the data consists of independent random vectors-an assumption that significantly limit its applicability to many practical setups. In this paper, we address this limitation by generalizing both results to the dependent setting. More precisely, we prove that Gaussian universality still holds for high-dimensional logistic regression under block dependence, $m$-dependence and special cases of $\beta$-mixing, and establish a novel CGMT framework that accommodates for correlation across both the covariates and observations. Using these results, we establish the impact of data augmentation, a widespread practice in deep learning, on the asymptotic risk.

Cite this Paper


BibTeX
@InProceedings{pmlr-v291-esmaili-mallory25a, title = {Universality of High-Dimensional Logistic Regression and a Novel CGMT under Dependence with Applications to Data Augmentation}, author = {Esmaili Mallory, Matthew and Huang, Kevin Han and Austern, Morgane}, booktitle = {Proceedings of Thirty Eighth Conference on Learning Theory}, pages = {1799--1918}, year = {2025}, editor = {Haghtalab, Nika and Moitra, Ankur}, volume = {291}, series = {Proceedings of Machine Learning Research}, month = {30 Jun--04 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v291/main/assets/esmaili-mallory25a/esmaili-mallory25a.pdf}, url = {https://proceedings.mlr.press/v291/esmaili-mallory25a.html}, abstract = {Over the last decade, a wave of research has characterized the exact asymptotic risk of many high-dimensional models in the proportional regime. Two foundational results have driven this progress: Gaussian universality, which shows that the asymptotic risk of estimators trained on non-Gaussian and Gaussian data is equivalent, and the convex Gaussian min-max theorem (CGMT), which characterizes the risk under Gaussian settings. However, these results rely on the assumption that the data consists of independent random vectors-an assumption that significantly limit its applicability to many practical setups. In this paper, we address this limitation by generalizing both results to the dependent setting. More precisely, we prove that Gaussian universality still holds for high-dimensional logistic regression under block dependence, $m$-dependence and special cases of $\beta$-mixing, and establish a novel CGMT framework that accommodates for correlation across both the covariates and observations. Using these results, we establish the impact of data augmentation, a widespread practice in deep learning, on the asymptotic risk.} }
Endnote
%0 Conference Paper %T Universality of High-Dimensional Logistic Regression and a Novel CGMT under Dependence with Applications to Data Augmentation %A Matthew Esmaili Mallory %A Kevin Han Huang %A Morgane Austern %B Proceedings of Thirty Eighth Conference on Learning Theory %C Proceedings of Machine Learning Research %D 2025 %E Nika Haghtalab %E Ankur Moitra %F pmlr-v291-esmaili-mallory25a %I PMLR %P 1799--1918 %U https://proceedings.mlr.press/v291/esmaili-mallory25a.html %V 291 %X Over the last decade, a wave of research has characterized the exact asymptotic risk of many high-dimensional models in the proportional regime. Two foundational results have driven this progress: Gaussian universality, which shows that the asymptotic risk of estimators trained on non-Gaussian and Gaussian data is equivalent, and the convex Gaussian min-max theorem (CGMT), which characterizes the risk under Gaussian settings. However, these results rely on the assumption that the data consists of independent random vectors-an assumption that significantly limit its applicability to many practical setups. In this paper, we address this limitation by generalizing both results to the dependent setting. More precisely, we prove that Gaussian universality still holds for high-dimensional logistic regression under block dependence, $m$-dependence and special cases of $\beta$-mixing, and establish a novel CGMT framework that accommodates for correlation across both the covariates and observations. Using these results, we establish the impact of data augmentation, a widespread practice in deep learning, on the asymptotic risk.
APA
Esmaili Mallory, M., Huang, K.H. & Austern, M.. (2025). Universality of High-Dimensional Logistic Regression and a Novel CGMT under Dependence with Applications to Data Augmentation. Proceedings of Thirty Eighth Conference on Learning Theory, in Proceedings of Machine Learning Research 291:1799-1918 Available from https://proceedings.mlr.press/v291/esmaili-mallory25a.html.

Related Material