Can We Scale Transformers to Predict Parameters of Diverse ImageNet Models?

Boris Knyazev, Doha Hwang, Simon Lacoste-Julien
Proceedings of the 40th International Conference on Machine Learning, PMLR 202:17243-17259, 2023.

Abstract

Pretraining a neural network on a large dataset is becoming a cornerstone in machine learning that is within the reach of only a few communities with large-resources. We aim at an ambitious goal of democratizing pretraining. Towards that goal, we train and release a single neural network that can predict high quality ImageNet parameters of other neural networks. By using predicted parameters for initialization we are able to boost training of diverse ImageNet models available in PyTorch. When transferred to other datasets, models initialized with predicted parameters also converge faster and reach competitive final performance.

Cite this Paper


BibTeX
@InProceedings{pmlr-v202-knyazev23a, title = {Can We Scale Transformers to Predict Parameters of Diverse {I}mage{N}et Models?}, author = {Knyazev, Boris and Hwang, Doha and Lacoste-Julien, Simon}, booktitle = {Proceedings of the 40th International Conference on Machine Learning}, pages = {17243--17259}, year = {2023}, editor = {Krause, Andreas and Brunskill, Emma and Cho, Kyunghyun and Engelhardt, Barbara and Sabato, Sivan and Scarlett, Jonathan}, volume = {202}, series = {Proceedings of Machine Learning Research}, month = {23--29 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v202/knyazev23a/knyazev23a.pdf}, url = {https://proceedings.mlr.press/v202/knyazev23a.html}, abstract = {Pretraining a neural network on a large dataset is becoming a cornerstone in machine learning that is within the reach of only a few communities with large-resources. We aim at an ambitious goal of democratizing pretraining. Towards that goal, we train and release a single neural network that can predict high quality ImageNet parameters of other neural networks. By using predicted parameters for initialization we are able to boost training of diverse ImageNet models available in PyTorch. When transferred to other datasets, models initialized with predicted parameters also converge faster and reach competitive final performance.} }
Endnote
%0 Conference Paper %T Can We Scale Transformers to Predict Parameters of Diverse ImageNet Models? %A Boris Knyazev %A Doha Hwang %A Simon Lacoste-Julien %B Proceedings of the 40th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2023 %E Andreas Krause %E Emma Brunskill %E Kyunghyun Cho %E Barbara Engelhardt %E Sivan Sabato %E Jonathan Scarlett %F pmlr-v202-knyazev23a %I PMLR %P 17243--17259 %U https://proceedings.mlr.press/v202/knyazev23a.html %V 202 %X Pretraining a neural network on a large dataset is becoming a cornerstone in machine learning that is within the reach of only a few communities with large-resources. We aim at an ambitious goal of democratizing pretraining. Towards that goal, we train and release a single neural network that can predict high quality ImageNet parameters of other neural networks. By using predicted parameters for initialization we are able to boost training of diverse ImageNet models available in PyTorch. When transferred to other datasets, models initialized with predicted parameters also converge faster and reach competitive final performance.
APA
Knyazev, B., Hwang, D. & Lacoste-Julien, S.. (2023). Can We Scale Transformers to Predict Parameters of Diverse ImageNet Models?. Proceedings of the 40th International Conference on Machine Learning, in Proceedings of Machine Learning Research 202:17243-17259 Available from https://proceedings.mlr.press/v202/knyazev23a.html.

Related Material