Towards Adaptive Residual Network Training: A Neural-ODE Perspective

Chengyu Dong, Liyuan Liu, Zichao Li, Jingbo Shang
Proceedings of the 37th International Conference on Machine Learning, PMLR 119:2616-2626, 2020.

Abstract

In pursuit of resource-economical machine learning, attempts have been made to dynamically adjust computation workloads in different training stages, i.e., starting with a shallow network and gradually increasing the model depth (and computation workloads) during training. However, there is neither guarantee nor guidance on designing such network grow, due to the lack of its theoretical underpinnings. In this work, to explore the theory behind, we conduct theoretical analyses from an ordinary differential equation perspective. Specifically, we illustrate the dynamics of network growth and propose a novel performance measure specific to the depth increase. Illuminated by our analyses, we move towards theoretically sound growing operations and schedulers, giving rise to an adaptive training algorithm for residual networks, LipGrow, which automatically increases network depth thus accelerates training. In our experiments, it achieves comparable performance while reducing ∼ 50% of training time.

Cite this Paper


BibTeX
@InProceedings{pmlr-v119-dong20c, title = {Towards Adaptive Residual Network Training: A Neural-{ODE} Perspective}, author = {Dong, Chengyu and Liu, Liyuan and Li, Zichao and Shang, Jingbo}, booktitle = {Proceedings of the 37th International Conference on Machine Learning}, pages = {2616--2626}, year = {2020}, editor = {III, Hal Daumé and Singh, Aarti}, volume = {119}, series = {Proceedings of Machine Learning Research}, month = {13--18 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v119/dong20c/dong20c.pdf}, url = {https://proceedings.mlr.press/v119/dong20c.html}, abstract = {In pursuit of resource-economical machine learning, attempts have been made to dynamically adjust computation workloads in different training stages, i.e., starting with a shallow network and gradually increasing the model depth (and computation workloads) during training. However, there is neither guarantee nor guidance on designing such network grow, due to the lack of its theoretical underpinnings. In this work, to explore the theory behind, we conduct theoretical analyses from an ordinary differential equation perspective. Specifically, we illustrate the dynamics of network growth and propose a novel performance measure specific to the depth increase. Illuminated by our analyses, we move towards theoretically sound growing operations and schedulers, giving rise to an adaptive training algorithm for residual networks, LipGrow, which automatically increases network depth thus accelerates training. In our experiments, it achieves comparable performance while reducing ∼ 50% of training time.} }
Endnote
%0 Conference Paper %T Towards Adaptive Residual Network Training: A Neural-ODE Perspective %A Chengyu Dong %A Liyuan Liu %A Zichao Li %A Jingbo Shang %B Proceedings of the 37th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2020 %E Hal Daumé III %E Aarti Singh %F pmlr-v119-dong20c %I PMLR %P 2616--2626 %U https://proceedings.mlr.press/v119/dong20c.html %V 119 %X In pursuit of resource-economical machine learning, attempts have been made to dynamically adjust computation workloads in different training stages, i.e., starting with a shallow network and gradually increasing the model depth (and computation workloads) during training. However, there is neither guarantee nor guidance on designing such network grow, due to the lack of its theoretical underpinnings. In this work, to explore the theory behind, we conduct theoretical analyses from an ordinary differential equation perspective. Specifically, we illustrate the dynamics of network growth and propose a novel performance measure specific to the depth increase. Illuminated by our analyses, we move towards theoretically sound growing operations and schedulers, giving rise to an adaptive training algorithm for residual networks, LipGrow, which automatically increases network depth thus accelerates training. In our experiments, it achieves comparable performance while reducing ∼ 50% of training time.
APA
Dong, C., Liu, L., Li, Z. & Shang, J.. (2020). Towards Adaptive Residual Network Training: A Neural-ODE Perspective. Proceedings of the 37th International Conference on Machine Learning, in Proceedings of Machine Learning Research 119:2616-2626 Available from https://proceedings.mlr.press/v119/dong20c.html.

Related Material