Classifying high-dimensional Gaussian mixtures: Where kernel methods fail and neural networks succeed

Maria Refinetti, Sebastian Goldt, Florent Krzakala, Lenka Zdeborova
Proceedings of the 38th International Conference on Machine Learning, PMLR 139:8936-8947, 2021.

Abstract

A recent series of theoretical works showed that the dynamics of neural networks with a certain initialisation are well-captured by kernel methods. Concurrent empirical work demonstrated that kernel methods can come close to the performance of neural networks on some image classification tasks. These results raise the question of whether neural networks only learn successfully if kernels also learn successfully, despite being the more expressive function class. Here, we show that two-layer neural networks with *only a few neurons* achieve near-optimal performance on high-dimensional Gaussian mixture classification while lazy training approaches such as random features and kernel methods do not. Our analysis is based on the derivation of a set of ordinary differential equations that exactly track the dynamics of the network and thus allow to extract the asymptotic performance of the network as a function of regularisation or signal-to-noise ratio. We also show how over-parametrising the neural network leads to faster convergence, but does not improve its final performance.

Cite this Paper


BibTeX
@InProceedings{pmlr-v139-refinetti21b, title = {Classifying high-dimensional Gaussian mixtures: Where kernel methods fail and neural networks succeed}, author = {Refinetti, Maria and Goldt, Sebastian and Krzakala, Florent and Zdeborova, Lenka}, booktitle = {Proceedings of the 38th International Conference on Machine Learning}, pages = {8936--8947}, year = {2021}, editor = {Meila, Marina and Zhang, Tong}, volume = {139}, series = {Proceedings of Machine Learning Research}, month = {18--24 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v139/refinetti21b/refinetti21b.pdf}, url = {https://proceedings.mlr.press/v139/refinetti21b.html}, abstract = {A recent series of theoretical works showed that the dynamics of neural networks with a certain initialisation are well-captured by kernel methods. Concurrent empirical work demonstrated that kernel methods can come close to the performance of neural networks on some image classification tasks. These results raise the question of whether neural networks only learn successfully if kernels also learn successfully, despite being the more expressive function class. Here, we show that two-layer neural networks with *only a few neurons* achieve near-optimal performance on high-dimensional Gaussian mixture classification while lazy training approaches such as random features and kernel methods do not. Our analysis is based on the derivation of a set of ordinary differential equations that exactly track the dynamics of the network and thus allow to extract the asymptotic performance of the network as a function of regularisation or signal-to-noise ratio. We also show how over-parametrising the neural network leads to faster convergence, but does not improve its final performance.} }
Endnote
%0 Conference Paper %T Classifying high-dimensional Gaussian mixtures: Where kernel methods fail and neural networks succeed %A Maria Refinetti %A Sebastian Goldt %A Florent Krzakala %A Lenka Zdeborova %B Proceedings of the 38th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2021 %E Marina Meila %E Tong Zhang %F pmlr-v139-refinetti21b %I PMLR %P 8936--8947 %U https://proceedings.mlr.press/v139/refinetti21b.html %V 139 %X A recent series of theoretical works showed that the dynamics of neural networks with a certain initialisation are well-captured by kernel methods. Concurrent empirical work demonstrated that kernel methods can come close to the performance of neural networks on some image classification tasks. These results raise the question of whether neural networks only learn successfully if kernels also learn successfully, despite being the more expressive function class. Here, we show that two-layer neural networks with *only a few neurons* achieve near-optimal performance on high-dimensional Gaussian mixture classification while lazy training approaches such as random features and kernel methods do not. Our analysis is based on the derivation of a set of ordinary differential equations that exactly track the dynamics of the network and thus allow to extract the asymptotic performance of the network as a function of regularisation or signal-to-noise ratio. We also show how over-parametrising the neural network leads to faster convergence, but does not improve its final performance.
APA
Refinetti, M., Goldt, S., Krzakala, F. & Zdeborova, L.. (2021). Classifying high-dimensional Gaussian mixtures: Where kernel methods fail and neural networks succeed. Proceedings of the 38th International Conference on Machine Learning, in Proceedings of Machine Learning Research 139:8936-8947 Available from https://proceedings.mlr.press/v139/refinetti21b.html.

Related Material