Global Convergence of Over-parameterized Deep Equilibrium Models

Zenan Ling, Xingyu Xie, Qiuhao Wang, Zongpeng Zhang, Zhouchen Lin
Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, PMLR 206:767-787, 2023.

Abstract

A deep equilibrium model (DEQ) is implicitly defined through an equilibrium point of an infinite-depth weight-tied model with an input-injection. Instead of infinite computations, it solves an equilibrium point directly with root-finding and computes gradients with implicit differentiation. In this paper, the training dynamics of over-parameterized DEQs are investigated, and we propose a novel probabilistic framework to overcome the challenge arising from the weight-sharing and the infinite depth. By supposing a condition on the initial equilibrium point, we prove that the gradient descent converges to a globally optimal solution at a linear convergence rate for the quadratic loss function. We further perform a fine-grained non-asymptotic analysis about random DEQs and the corresponding weight-untied models, and show that the required initial condition is satisfied via mild over-parameterization. Moreover, we show that the unique equilibrium point always exists during the training.

Cite this Paper


BibTeX
@InProceedings{pmlr-v206-ling23a, title = {Global Convergence of Over-parameterized Deep Equilibrium Models}, author = {Ling, Zenan and Xie, Xingyu and Wang, Qiuhao and Zhang, Zongpeng and Lin, Zhouchen}, booktitle = {Proceedings of The 26th International Conference on Artificial Intelligence and Statistics}, pages = {767--787}, year = {2023}, editor = {Ruiz, Francisco and Dy, Jennifer and van de Meent, Jan-Willem}, volume = {206}, series = {Proceedings of Machine Learning Research}, month = {25--27 Apr}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v206/ling23a/ling23a.pdf}, url = {https://proceedings.mlr.press/v206/ling23a.html}, abstract = {A deep equilibrium model (DEQ) is implicitly defined through an equilibrium point of an infinite-depth weight-tied model with an input-injection. Instead of infinite computations, it solves an equilibrium point directly with root-finding and computes gradients with implicit differentiation. In this paper, the training dynamics of over-parameterized DEQs are investigated, and we propose a novel probabilistic framework to overcome the challenge arising from the weight-sharing and the infinite depth. By supposing a condition on the initial equilibrium point, we prove that the gradient descent converges to a globally optimal solution at a linear convergence rate for the quadratic loss function. We further perform a fine-grained non-asymptotic analysis about random DEQs and the corresponding weight-untied models, and show that the required initial condition is satisfied via mild over-parameterization. Moreover, we show that the unique equilibrium point always exists during the training.} }
Endnote
%0 Conference Paper %T Global Convergence of Over-parameterized Deep Equilibrium Models %A Zenan Ling %A Xingyu Xie %A Qiuhao Wang %A Zongpeng Zhang %A Zhouchen Lin %B Proceedings of The 26th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2023 %E Francisco Ruiz %E Jennifer Dy %E Jan-Willem van de Meent %F pmlr-v206-ling23a %I PMLR %P 767--787 %U https://proceedings.mlr.press/v206/ling23a.html %V 206 %X A deep equilibrium model (DEQ) is implicitly defined through an equilibrium point of an infinite-depth weight-tied model with an input-injection. Instead of infinite computations, it solves an equilibrium point directly with root-finding and computes gradients with implicit differentiation. In this paper, the training dynamics of over-parameterized DEQs are investigated, and we propose a novel probabilistic framework to overcome the challenge arising from the weight-sharing and the infinite depth. By supposing a condition on the initial equilibrium point, we prove that the gradient descent converges to a globally optimal solution at a linear convergence rate for the quadratic loss function. We further perform a fine-grained non-asymptotic analysis about random DEQs and the corresponding weight-untied models, and show that the required initial condition is satisfied via mild over-parameterization. Moreover, we show that the unique equilibrium point always exists during the training.
APA
Ling, Z., Xie, X., Wang, Q., Zhang, Z. & Lin, Z.. (2023). Global Convergence of Over-parameterized Deep Equilibrium Models. Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 206:767-787 Available from https://proceedings.mlr.press/v206/ling23a.html.

Related Material