Learning to Boost Training by Periodic Nowcasting Near Future Weights

Jinhyeok Jang, Woo-Han Yun, Won Hwa Kim, Youngwoo Yoon, Jaehong Kim, Jaeyeon Lee, Byungok Han
Proceedings of the 40th International Conference on Machine Learning, PMLR 202:14730-14757, 2023.

Abstract

Recent complicated problems require large-scale datasets and complex model architectures, however, it is difficult to train such large networks due to high computational issues. Significant efforts have been made to make the training more efficient such as momentum, learning rate scheduling, weight regularization, and meta-learning. Based on our observations on 1) high correlation between past eights and future weights, 2) conditions for beneficial weight prediction, and 3) feasibility of weight prediction, we propose a more general framework by intermittently skipping a handful of epochs by periodically forecasting near future weights, i.e., a Weight Nowcaster Network (WNN). As an add-on module, WNN predicts the future weights to make the learning process faster regardless of tasks and architectures. Experimental results show that WNN can significantly save actual time cost for training with an additional marginal time to train WNN. We validate the generalization capability of WNN under various tasks, and demonstrate that it works well even for unseen tasks. The code and pre-trained model are available at https://github.com/jjh6297/WNN.

Cite this Paper


BibTeX
@InProceedings{pmlr-v202-jang23b, title = {Learning to Boost Training by Periodic Nowcasting Near Future Weights}, author = {Jang, Jinhyeok and Yun, Woo-Han and Kim, Won Hwa and Yoon, Youngwoo and Kim, Jaehong and Lee, Jaeyeon and Han, Byungok}, booktitle = {Proceedings of the 40th International Conference on Machine Learning}, pages = {14730--14757}, year = {2023}, editor = {Krause, Andreas and Brunskill, Emma and Cho, Kyunghyun and Engelhardt, Barbara and Sabato, Sivan and Scarlett, Jonathan}, volume = {202}, series = {Proceedings of Machine Learning Research}, month = {23--29 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v202/jang23b/jang23b.pdf}, url = {https://proceedings.mlr.press/v202/jang23b.html}, abstract = {Recent complicated problems require large-scale datasets and complex model architectures, however, it is difficult to train such large networks due to high computational issues. Significant efforts have been made to make the training more efficient such as momentum, learning rate scheduling, weight regularization, and meta-learning. Based on our observations on 1) high correlation between past eights and future weights, 2) conditions for beneficial weight prediction, and 3) feasibility of weight prediction, we propose a more general framework by intermittently skipping a handful of epochs by periodically forecasting near future weights, i.e., a Weight Nowcaster Network (WNN). As an add-on module, WNN predicts the future weights to make the learning process faster regardless of tasks and architectures. Experimental results show that WNN can significantly save actual time cost for training with an additional marginal time to train WNN. We validate the generalization capability of WNN under various tasks, and demonstrate that it works well even for unseen tasks. The code and pre-trained model are available at https://github.com/jjh6297/WNN.} }
Endnote
%0 Conference Paper %T Learning to Boost Training by Periodic Nowcasting Near Future Weights %A Jinhyeok Jang %A Woo-Han Yun %A Won Hwa Kim %A Youngwoo Yoon %A Jaehong Kim %A Jaeyeon Lee %A Byungok Han %B Proceedings of the 40th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2023 %E Andreas Krause %E Emma Brunskill %E Kyunghyun Cho %E Barbara Engelhardt %E Sivan Sabato %E Jonathan Scarlett %F pmlr-v202-jang23b %I PMLR %P 14730--14757 %U https://proceedings.mlr.press/v202/jang23b.html %V 202 %X Recent complicated problems require large-scale datasets and complex model architectures, however, it is difficult to train such large networks due to high computational issues. Significant efforts have been made to make the training more efficient such as momentum, learning rate scheduling, weight regularization, and meta-learning. Based on our observations on 1) high correlation between past eights and future weights, 2) conditions for beneficial weight prediction, and 3) feasibility of weight prediction, we propose a more general framework by intermittently skipping a handful of epochs by periodically forecasting near future weights, i.e., a Weight Nowcaster Network (WNN). As an add-on module, WNN predicts the future weights to make the learning process faster regardless of tasks and architectures. Experimental results show that WNN can significantly save actual time cost for training with an additional marginal time to train WNN. We validate the generalization capability of WNN under various tasks, and demonstrate that it works well even for unseen tasks. The code and pre-trained model are available at https://github.com/jjh6297/WNN.
APA
Jang, J., Yun, W., Kim, W.H., Yoon, Y., Kim, J., Lee, J. & Han, B.. (2023). Learning to Boost Training by Periodic Nowcasting Near Future Weights. Proceedings of the 40th International Conference on Machine Learning, in Proceedings of Machine Learning Research 202:14730-14757 Available from https://proceedings.mlr.press/v202/jang23b.html.

Related Material