Linear Mode Connectivity and the Lottery Ticket Hypothesis

Jonathan Frankle, Gintare Karolina Dziugaite, Daniel Roy, Michael Carbin
Proceedings of the 37th International Conference on Machine Learning, PMLR 119:3259-3269, 2020.

Abstract

We study whether a neural network optimizes to the same, linearly connected minimum under different samples of SGD noise (e.g., random data order and augmentation). We find that standard vision models become stable to SGD noise in this way early in training. From then on, the outcome of optimization is determined to a linearly connected region. We use this technique to study iterative magnitude pruning (IMP), the procedure used by work on the lottery ticket hypothesis to identify subnetworks that could have trained in isolation to full accuracy. We find that these subnetworks only reach full accuracy when they are stable to SGD noise, which either occurs at initialization for small-scale settings (MNIST) or early in training for large-scale settings (ResNet-50 and Inception-v3 on ImageNet).

Cite this Paper


BibTeX
@InProceedings{pmlr-v119-frankle20a, title = {Linear Mode Connectivity and the Lottery Ticket Hypothesis}, author = {Frankle, Jonathan and Dziugaite, Gintare Karolina and Roy, Daniel and Carbin, Michael}, booktitle = {Proceedings of the 37th International Conference on Machine Learning}, pages = {3259--3269}, year = {2020}, editor = {III, Hal Daumé and Singh, Aarti}, volume = {119}, series = {Proceedings of Machine Learning Research}, month = {13--18 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v119/frankle20a/frankle20a.pdf}, url = {https://proceedings.mlr.press/v119/frankle20a.html}, abstract = {We study whether a neural network optimizes to the same, linearly connected minimum under different samples of SGD noise (e.g., random data order and augmentation). We find that standard vision models become stable to SGD noise in this way early in training. From then on, the outcome of optimization is determined to a linearly connected region. We use this technique to study iterative magnitude pruning (IMP), the procedure used by work on the lottery ticket hypothesis to identify subnetworks that could have trained in isolation to full accuracy. We find that these subnetworks only reach full accuracy when they are stable to SGD noise, which either occurs at initialization for small-scale settings (MNIST) or early in training for large-scale settings (ResNet-50 and Inception-v3 on ImageNet).} }
Endnote
%0 Conference Paper %T Linear Mode Connectivity and the Lottery Ticket Hypothesis %A Jonathan Frankle %A Gintare Karolina Dziugaite %A Daniel Roy %A Michael Carbin %B Proceedings of the 37th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2020 %E Hal Daumé III %E Aarti Singh %F pmlr-v119-frankle20a %I PMLR %P 3259--3269 %U https://proceedings.mlr.press/v119/frankle20a.html %V 119 %X We study whether a neural network optimizes to the same, linearly connected minimum under different samples of SGD noise (e.g., random data order and augmentation). We find that standard vision models become stable to SGD noise in this way early in training. From then on, the outcome of optimization is determined to a linearly connected region. We use this technique to study iterative magnitude pruning (IMP), the procedure used by work on the lottery ticket hypothesis to identify subnetworks that could have trained in isolation to full accuracy. We find that these subnetworks only reach full accuracy when they are stable to SGD noise, which either occurs at initialization for small-scale settings (MNIST) or early in training for large-scale settings (ResNet-50 and Inception-v3 on ImageNet).
APA
Frankle, J., Dziugaite, G.K., Roy, D. & Carbin, M.. (2020). Linear Mode Connectivity and the Lottery Ticket Hypothesis. Proceedings of the 37th International Conference on Machine Learning, in Proceedings of Machine Learning Research 119:3259-3269 Available from https://proceedings.mlr.press/v119/frankle20a.html.

Related Material