An Optimal Control Approach to Deep Learning and Applications to Discrete-Weight Neural Networks

Qianxiao Li, Shuji Hao
Proceedings of the 35th International Conference on Machine Learning, PMLR 80:2985-2994, 2018.

Abstract

Deep learning is formulated as a discrete-time optimal control problem. This allows one to characterize necessary conditions for optimality and develop training algorithms that do not rely on gradients with respect to the trainable parameters. In particular, we introduce the discrete-time method of successive approximations (MSA), which is based on the Pontryagin’s maximum principle, for training neural networks. A rigorous error estimate for the discrete MSA is obtained, which sheds light on its dynamics and the means to stabilize the algorithm. The developed methods are applied to train, in a rather principled way, neural networks with weights that are constrained to take values in a discrete set. We obtain competitive performance and interestingly, very sparse weights in the case of ternary networks, which may be useful in model deployment in low-memory devices.

Cite this Paper


BibTeX
@InProceedings{pmlr-v80-li18b, title = {An Optimal Control Approach to Deep Learning and Applications to Discrete-Weight Neural Networks}, author = {Li, Qianxiao and Hao, Shuji}, booktitle = {Proceedings of the 35th International Conference on Machine Learning}, pages = {2985--2994}, year = {2018}, editor = {Dy, Jennifer and Krause, Andreas}, volume = {80}, series = {Proceedings of Machine Learning Research}, month = {10--15 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v80/li18b/li18b.pdf}, url = {https://proceedings.mlr.press/v80/li18b.html}, abstract = {Deep learning is formulated as a discrete-time optimal control problem. This allows one to characterize necessary conditions for optimality and develop training algorithms that do not rely on gradients with respect to the trainable parameters. In particular, we introduce the discrete-time method of successive approximations (MSA), which is based on the Pontryagin’s maximum principle, for training neural networks. A rigorous error estimate for the discrete MSA is obtained, which sheds light on its dynamics and the means to stabilize the algorithm. The developed methods are applied to train, in a rather principled way, neural networks with weights that are constrained to take values in a discrete set. We obtain competitive performance and interestingly, very sparse weights in the case of ternary networks, which may be useful in model deployment in low-memory devices.} }
Endnote
%0 Conference Paper %T An Optimal Control Approach to Deep Learning and Applications to Discrete-Weight Neural Networks %A Qianxiao Li %A Shuji Hao %B Proceedings of the 35th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2018 %E Jennifer Dy %E Andreas Krause %F pmlr-v80-li18b %I PMLR %P 2985--2994 %U https://proceedings.mlr.press/v80/li18b.html %V 80 %X Deep learning is formulated as a discrete-time optimal control problem. This allows one to characterize necessary conditions for optimality and develop training algorithms that do not rely on gradients with respect to the trainable parameters. In particular, we introduce the discrete-time method of successive approximations (MSA), which is based on the Pontryagin’s maximum principle, for training neural networks. A rigorous error estimate for the discrete MSA is obtained, which sheds light on its dynamics and the means to stabilize the algorithm. The developed methods are applied to train, in a rather principled way, neural networks with weights that are constrained to take values in a discrete set. We obtain competitive performance and interestingly, very sparse weights in the case of ternary networks, which may be useful in model deployment in low-memory devices.
APA
Li, Q. & Hao, S.. (2018). An Optimal Control Approach to Deep Learning and Applications to Discrete-Weight Neural Networks. Proceedings of the 35th International Conference on Machine Learning, in Proceedings of Machine Learning Research 80:2985-2994 Available from https://proceedings.mlr.press/v80/li18b.html.

Related Material