The Role of Regularization in Classification of High-dimensional Noisy Gaussian Mixture

Francesca Mignacco, Florent Krzakala, Yue Lu, Pierfrancesco Urbani, Lenka Zdeborova
Proceedings of the 37th International Conference on Machine Learning, PMLR 119:6874-6883, 2020.

Abstract

We consider a high-dimensional mixture of two Gaussians in the noisy regime where even an oracle knowing the centers of the clusters misclassifies a small but finite fraction of the points. We provide a rigorous analysis of the generalization error of regularized convex classifiers, including ridge, hinge and logistic regression, in the high-dimensional limit where the number $n$ of samples and their dimension $d$ go to infinity while their ratio is fixed to $\alpha=n/d$. We discuss surprising effects of the regularization that in some cases allows to reach the Bayes-optimal performances. We also illustrate the interpolation peak at low regularization, and analyze the role of the respective sizes of the two clusters.

Cite this Paper


BibTeX
@InProceedings{pmlr-v119-mignacco20a, title = {The Role of Regularization in Classification of High-dimensional Noisy {G}aussian Mixture}, author = {Mignacco, Francesca and Krzakala, Florent and Lu, Yue and Urbani, Pierfrancesco and Zdeborova, Lenka}, booktitle = {Proceedings of the 37th International Conference on Machine Learning}, pages = {6874--6883}, year = {2020}, editor = {III, Hal Daumé and Singh, Aarti}, volume = {119}, series = {Proceedings of Machine Learning Research}, month = {13--18 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v119/mignacco20a/mignacco20a.pdf}, url = {https://proceedings.mlr.press/v119/mignacco20a.html}, abstract = {We consider a high-dimensional mixture of two Gaussians in the noisy regime where even an oracle knowing the centers of the clusters misclassifies a small but finite fraction of the points. We provide a rigorous analysis of the generalization error of regularized convex classifiers, including ridge, hinge and logistic regression, in the high-dimensional limit where the number $n$ of samples and their dimension $d$ go to infinity while their ratio is fixed to $\alpha=n/d$. We discuss surprising effects of the regularization that in some cases allows to reach the Bayes-optimal performances. We also illustrate the interpolation peak at low regularization, and analyze the role of the respective sizes of the two clusters.} }
Endnote
%0 Conference Paper %T The Role of Regularization in Classification of High-dimensional Noisy Gaussian Mixture %A Francesca Mignacco %A Florent Krzakala %A Yue Lu %A Pierfrancesco Urbani %A Lenka Zdeborova %B Proceedings of the 37th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2020 %E Hal Daumé III %E Aarti Singh %F pmlr-v119-mignacco20a %I PMLR %P 6874--6883 %U https://proceedings.mlr.press/v119/mignacco20a.html %V 119 %X We consider a high-dimensional mixture of two Gaussians in the noisy regime where even an oracle knowing the centers of the clusters misclassifies a small but finite fraction of the points. We provide a rigorous analysis of the generalization error of regularized convex classifiers, including ridge, hinge and logistic regression, in the high-dimensional limit where the number $n$ of samples and their dimension $d$ go to infinity while their ratio is fixed to $\alpha=n/d$. We discuss surprising effects of the regularization that in some cases allows to reach the Bayes-optimal performances. We also illustrate the interpolation peak at low regularization, and analyze the role of the respective sizes of the two clusters.
APA
Mignacco, F., Krzakala, F., Lu, Y., Urbani, P. & Zdeborova, L.. (2020). The Role of Regularization in Classification of High-dimensional Noisy Gaussian Mixture. Proceedings of the 37th International Conference on Machine Learning, in Proceedings of Machine Learning Research 119:6874-6883 Available from https://proceedings.mlr.press/v119/mignacco20a.html.

Related Material