Global Convergence of Block Coordinate Descent in Deep Learning

Jinshan Zeng, Tim Tsz-Kit Lau, Shaobo Lin, Yuan Yao
Proceedings of the 36th International Conference on Machine Learning, PMLR 97:7313-7323, 2019.

Abstract

Deep learning has aroused extensive attention due to its great empirical success. The efficiency of the block coordinate descent (BCD) methods has been recently demonstrated in deep neural network (DNN) training. However, theoretical studies on their convergence properties are limited due to the highly nonconvex nature of DNN training. In this paper, we aim at providing a general methodology for provable convergence guarantees for this type of methods. In particular, for most of the commonly used DNN training models involving both two- and three-splitting schemes, we establish the global convergence to a critical point at a rate of ${\cal O}(1/k)$, where $k$ is the number of iterations. The results extend to general loss functions which have Lipschitz continuous gradients and deep residual networks (ResNets). Our key development adds several new elements to the Kurdyka-Lojasiewicz inequality framework that enables us to carry out the global convergence analysis of BCD in the general scenario of deep learning.

Cite this Paper


BibTeX
@InProceedings{pmlr-v97-zeng19a, title = {Global Convergence of Block Coordinate Descent in Deep Learning}, author = {Zeng, Jinshan and Lau, Tim Tsz-Kit and Lin, Shaobo and Yao, Yuan}, booktitle = {Proceedings of the 36th International Conference on Machine Learning}, pages = {7313--7323}, year = {2019}, editor = {Chaudhuri, Kamalika and Salakhutdinov, Ruslan}, volume = {97}, series = {Proceedings of Machine Learning Research}, month = {09--15 Jun}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v97/zeng19a/zeng19a.pdf}, url = {https://proceedings.mlr.press/v97/zeng19a.html}, abstract = {Deep learning has aroused extensive attention due to its great empirical success. The efficiency of the block coordinate descent (BCD) methods has been recently demonstrated in deep neural network (DNN) training. However, theoretical studies on their convergence properties are limited due to the highly nonconvex nature of DNN training. In this paper, we aim at providing a general methodology for provable convergence guarantees for this type of methods. In particular, for most of the commonly used DNN training models involving both two- and three-splitting schemes, we establish the global convergence to a critical point at a rate of ${\cal O}(1/k)$, where $k$ is the number of iterations. The results extend to general loss functions which have Lipschitz continuous gradients and deep residual networks (ResNets). Our key development adds several new elements to the Kurdyka-Lojasiewicz inequality framework that enables us to carry out the global convergence analysis of BCD in the general scenario of deep learning.} }
Endnote
%0 Conference Paper %T Global Convergence of Block Coordinate Descent in Deep Learning %A Jinshan Zeng %A Tim Tsz-Kit Lau %A Shaobo Lin %A Yuan Yao %B Proceedings of the 36th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2019 %E Kamalika Chaudhuri %E Ruslan Salakhutdinov %F pmlr-v97-zeng19a %I PMLR %P 7313--7323 %U https://proceedings.mlr.press/v97/zeng19a.html %V 97 %X Deep learning has aroused extensive attention due to its great empirical success. The efficiency of the block coordinate descent (BCD) methods has been recently demonstrated in deep neural network (DNN) training. However, theoretical studies on their convergence properties are limited due to the highly nonconvex nature of DNN training. In this paper, we aim at providing a general methodology for provable convergence guarantees for this type of methods. In particular, for most of the commonly used DNN training models involving both two- and three-splitting schemes, we establish the global convergence to a critical point at a rate of ${\cal O}(1/k)$, where $k$ is the number of iterations. The results extend to general loss functions which have Lipschitz continuous gradients and deep residual networks (ResNets). Our key development adds several new elements to the Kurdyka-Lojasiewicz inequality framework that enables us to carry out the global convergence analysis of BCD in the general scenario of deep learning.
APA
Zeng, J., Lau, T.T., Lin, S. & Yao, Y.. (2019). Global Convergence of Block Coordinate Descent in Deep Learning. Proceedings of the 36th International Conference on Machine Learning, in Proceedings of Machine Learning Research 97:7313-7323 Available from https://proceedings.mlr.press/v97/zeng19a.html.

Related Material