Multicoated Supermasks Enhance Hidden Networks

Yasuyuki Okoshi, Ángel López Garcı́a-Arias, Kazutoshi Hirose, Kota Ando, Kazushi Kawamura, Thiem Van Chu, Masato Motomura, Jaehoon Yu
Proceedings of the 39th International Conference on Machine Learning, PMLR 162:17045-17055, 2022.

Abstract

Hidden Networks (Ramanujan et al., 2020) showed the possibility of finding accurate subnetworks within a randomly weighted neural network by training a connectivity mask, referred to as supermask. We show that the supermask stops improving even though gradients are not zero, thus underutilizing backpropagated information. To address this we propose a method that extends Hidden Networks by training an overlay of multiple hierarchical supermasks{—}a multicoated supermask. This method shows that using multiple supermasks for a single task achieves higher accuracy without additional training cost. Experiments on CIFAR-10 and ImageNet show that Multicoated Supermasks enhance the tradeoff between accuracy and model size. A ResNet-101 using a 7-coated supermask outperforms its Hidden Networks counterpart by 4%, matching the accuracy of a dense ResNet-50 while being an order of magnitude smaller.

Cite this Paper


BibTeX
@InProceedings{pmlr-v162-okoshi22a, title = {Multicoated Supermasks Enhance Hidden Networks}, author = {Okoshi, Yasuyuki and Garc\'{\i}a-Arias, \'Angel L{\'o}pez and Hirose, Kazutoshi and Ando, Kota and Kawamura, Kazushi and Van Chu, Thiem and Motomura, Masato and Yu, Jaehoon}, booktitle = {Proceedings of the 39th International Conference on Machine Learning}, pages = {17045--17055}, year = {2022}, editor = {Chaudhuri, Kamalika and Jegelka, Stefanie and Song, Le and Szepesvari, Csaba and Niu, Gang and Sabato, Sivan}, volume = {162}, series = {Proceedings of Machine Learning Research}, month = {17--23 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v162/okoshi22a/okoshi22a.pdf}, url = {https://proceedings.mlr.press/v162/okoshi22a.html}, abstract = {Hidden Networks (Ramanujan et al., 2020) showed the possibility of finding accurate subnetworks within a randomly weighted neural network by training a connectivity mask, referred to as supermask. We show that the supermask stops improving even though gradients are not zero, thus underutilizing backpropagated information. To address this we propose a method that extends Hidden Networks by training an overlay of multiple hierarchical supermasks{—}a multicoated supermask. This method shows that using multiple supermasks for a single task achieves higher accuracy without additional training cost. Experiments on CIFAR-10 and ImageNet show that Multicoated Supermasks enhance the tradeoff between accuracy and model size. A ResNet-101 using a 7-coated supermask outperforms its Hidden Networks counterpart by 4%, matching the accuracy of a dense ResNet-50 while being an order of magnitude smaller.} }
Endnote
%0 Conference Paper %T Multicoated Supermasks Enhance Hidden Networks %A Yasuyuki Okoshi %A Ángel López Garcı́a-Arias %A Kazutoshi Hirose %A Kota Ando %A Kazushi Kawamura %A Thiem Van Chu %A Masato Motomura %A Jaehoon Yu %B Proceedings of the 39th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2022 %E Kamalika Chaudhuri %E Stefanie Jegelka %E Le Song %E Csaba Szepesvari %E Gang Niu %E Sivan Sabato %F pmlr-v162-okoshi22a %I PMLR %P 17045--17055 %U https://proceedings.mlr.press/v162/okoshi22a.html %V 162 %X Hidden Networks (Ramanujan et al., 2020) showed the possibility of finding accurate subnetworks within a randomly weighted neural network by training a connectivity mask, referred to as supermask. We show that the supermask stops improving even though gradients are not zero, thus underutilizing backpropagated information. To address this we propose a method that extends Hidden Networks by training an overlay of multiple hierarchical supermasks{—}a multicoated supermask. This method shows that using multiple supermasks for a single task achieves higher accuracy without additional training cost. Experiments on CIFAR-10 and ImageNet show that Multicoated Supermasks enhance the tradeoff between accuracy and model size. A ResNet-101 using a 7-coated supermask outperforms its Hidden Networks counterpart by 4%, matching the accuracy of a dense ResNet-50 while being an order of magnitude smaller.
APA
Okoshi, Y., Garcı́a-Arias, Á.L., Hirose, K., Ando, K., Kawamura, K., Van Chu, T., Motomura, M. & Yu, J.. (2022). Multicoated Supermasks Enhance Hidden Networks. Proceedings of the 39th International Conference on Machine Learning, in Proceedings of Machine Learning Research 162:17045-17055 Available from https://proceedings.mlr.press/v162/okoshi22a.html.

Related Material