Learning Neural Network Subspaces

Mitchell Wortsman, Maxwell C Horton, Carlos Guestrin, Ali Farhadi, Mohammad Rastegari
Proceedings of the 38th International Conference on Machine Learning, PMLR 139:11217-11227, 2021.

Abstract

Recent observations have advanced our understanding of the neural network optimization landscape, revealing the existence of (1) paths of high accuracy containing diverse solutions and (2) wider minima offering improved performance. Previous methods observing diverse paths require multiple training runs. In contrast we aim to leverage both property (1) and (2) with a single method and in a single training run. With a similar computational cost as training one model, we learn lines, curves, and simplexes of high-accuracy neural networks. These neural network subspaces contain diverse solutions that can be ensembled, approaching the ensemble performance of independently trained networks without the training cost. Moreover, using the subspace midpoint boosts accuracy, calibration, and robustness to label noise, outperforming Stochastic Weight Averaging.

Cite this Paper


BibTeX
@InProceedings{pmlr-v139-wortsman21a, title = {Learning Neural Network Subspaces}, author = {Wortsman, Mitchell and Horton, Maxwell C and Guestrin, Carlos and Farhadi, Ali and Rastegari, Mohammad}, booktitle = {Proceedings of the 38th International Conference on Machine Learning}, pages = {11217--11227}, year = {2021}, editor = {Meila, Marina and Zhang, Tong}, volume = {139}, series = {Proceedings of Machine Learning Research}, month = {18--24 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v139/wortsman21a/wortsman21a.pdf}, url = {https://proceedings.mlr.press/v139/wortsman21a.html}, abstract = {Recent observations have advanced our understanding of the neural network optimization landscape, revealing the existence of (1) paths of high accuracy containing diverse solutions and (2) wider minima offering improved performance. Previous methods observing diverse paths require multiple training runs. In contrast we aim to leverage both property (1) and (2) with a single method and in a single training run. With a similar computational cost as training one model, we learn lines, curves, and simplexes of high-accuracy neural networks. These neural network subspaces contain diverse solutions that can be ensembled, approaching the ensemble performance of independently trained networks without the training cost. Moreover, using the subspace midpoint boosts accuracy, calibration, and robustness to label noise, outperforming Stochastic Weight Averaging.} }
Endnote
%0 Conference Paper %T Learning Neural Network Subspaces %A Mitchell Wortsman %A Maxwell C Horton %A Carlos Guestrin %A Ali Farhadi %A Mohammad Rastegari %B Proceedings of the 38th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2021 %E Marina Meila %E Tong Zhang %F pmlr-v139-wortsman21a %I PMLR %P 11217--11227 %U https://proceedings.mlr.press/v139/wortsman21a.html %V 139 %X Recent observations have advanced our understanding of the neural network optimization landscape, revealing the existence of (1) paths of high accuracy containing diverse solutions and (2) wider minima offering improved performance. Previous methods observing diverse paths require multiple training runs. In contrast we aim to leverage both property (1) and (2) with a single method and in a single training run. With a similar computational cost as training one model, we learn lines, curves, and simplexes of high-accuracy neural networks. These neural network subspaces contain diverse solutions that can be ensembled, approaching the ensemble performance of independently trained networks without the training cost. Moreover, using the subspace midpoint boosts accuracy, calibration, and robustness to label noise, outperforming Stochastic Weight Averaging.
APA
Wortsman, M., Horton, M.C., Guestrin, C., Farhadi, A. & Rastegari, M.. (2021). Learning Neural Network Subspaces. Proceedings of the 38th International Conference on Machine Learning, in Proceedings of Machine Learning Research 139:11217-11227 Available from https://proceedings.mlr.press/v139/wortsman21a.html.

Related Material