Global Convergence of Over-parameterized Deep Equilibrium Models

Zenan Ling; Xingyu Xie; Qiuhao Wang; Zongpeng Zhang; Zhouchen Lin

Global Convergence of Over-parameterized Deep Equilibrium Models

Zenan Ling, Xingyu Xie, Qiuhao Wang, Zongpeng Zhang, Zhouchen Lin

Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, PMLR 206:767-787, 2023.

Abstract

A deep equilibrium model (DEQ) is implicitly defined through an equilibrium point of an infinite-depth weight-tied model with an input-injection. Instead of infinite computations, it solves an equilibrium point directly with root-finding and computes gradients with implicit differentiation. In this paper, the training dynamics of over-parameterized DEQs are investigated, and we propose a novel probabilistic framework to overcome the challenge arising from the weight-sharing and the infinite depth. By supposing a condition on the initial equilibrium point, we prove that the gradient descent converges to a globally optimal solution at a linear convergence rate for the quadratic loss function. We further perform a fine-grained non-asymptotic analysis about random DEQs and the corresponding weight-untied models, and show that the required initial condition is satisfied via mild over-parameterization. Moreover, we show that the unique equilibrium point always exists during the training.

Cite this Paper

BibTeX


@InProceedings{pmlr-v206-ling23a,
  title = 	 {Global Convergence of Over-parameterized Deep Equilibrium Models},
  author =       {Ling, Zenan and Xie, Xingyu and Wang, Qiuhao and Zhang, Zongpeng and Lin, Zhouchen},
  booktitle = 	 {Proceedings of The 26th International Conference on Artificial Intelligence and Statistics},
  pages = 	 {767--787},
  year = 	 {2023},
  editor = 	 {Ruiz, Francisco and Dy, Jennifer and van de Meent, Jan-Willem},
  volume = 	 {206},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {25--27 Apr},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v206/ling23a/ling23a.pdf},
  url = 	 {https://proceedings.mlr.press/v206/ling23a.html},
  abstract = 	 {A deep equilibrium model (DEQ) is implicitly defined through an equilibrium point of an infinite-depth weight-tied model with an input-injection. Instead of infinite computations, it solves an equilibrium point directly with root-finding and computes gradients with implicit differentiation. In this paper, the training dynamics of over-parameterized DEQs are investigated, and we propose a novel probabilistic framework to overcome the challenge arising from the weight-sharing and the infinite depth. By supposing a condition on the initial equilibrium point, we prove that the gradient descent converges to a globally optimal solution at a linear convergence rate for the quadratic loss function. We further perform a fine-grained non-asymptotic analysis about random DEQs and the corresponding weight-untied models, and show that the required initial condition is satisfied via mild over-parameterization. Moreover, we show that the unique equilibrium point always exists during the training.}
}

Endnote

%0 Conference Paper
%T Global Convergence of Over-parameterized Deep Equilibrium Models
%A Zenan Ling
%A Xingyu Xie
%A Qiuhao Wang
%A Zongpeng Zhang
%A Zhouchen Lin
%B Proceedings of The 26th International Conference on Artificial Intelligence and Statistics
%C Proceedings of Machine Learning Research
%D 2023
%E Francisco Ruiz
%E Jennifer Dy
%E Jan-Willem van de Meent	
%F pmlr-v206-ling23a
%I PMLR
%P 767--787
%U https://proceedings.mlr.press/v206/ling23a.html
%V 206
%X A deep equilibrium model (DEQ) is implicitly defined through an equilibrium point of an infinite-depth weight-tied model with an input-injection. Instead of infinite computations, it solves an equilibrium point directly with root-finding and computes gradients with implicit differentiation. In this paper, the training dynamics of over-parameterized DEQs are investigated, and we propose a novel probabilistic framework to overcome the challenge arising from the weight-sharing and the infinite depth. By supposing a condition on the initial equilibrium point, we prove that the gradient descent converges to a globally optimal solution at a linear convergence rate for the quadratic loss function. We further perform a fine-grained non-asymptotic analysis about random DEQs and the corresponding weight-untied models, and show that the required initial condition is satisfied via mild over-parameterization. Moreover, we show that the unique equilibrium point always exists during the training.

APA


Ling, Z., Xie, X., Wang, Q., Zhang, Z. & Lin, Z.. (2023). Global Convergence of Over-parameterized Deep Equilibrium Models. Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 206:767-787 Available from https://proceedings.mlr.press/v206/ling23a.html.

Global Convergence of Over-parameterized Deep Equilibrium Models

Abstract

Cite this Paper

Related Material