Deep equilibrium models as estimators for continuous latent variables

Russell Tsuchida, Cheng Soon Ong
Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, PMLR 206:1646-1671, 2023.

Abstract

Principal Component Analysis (PCA) and its exponential family extensions have three components: observations, latents and parameters of a linear transformation. We consider a generalised setting where the canonical parameters of the exponential family are a nonlinear transformation of the latents. We show explicit relationships between particular neural network architectures and the corresponding statistical models. We find that deep equilibrium models — a recently introduced class of implicit neural networks — solve maximum a-posteriori (MAP) estimates for the latents and parameters of the transformation. Our analysis provides a systematic way to relate activation functions, dropout, and layer structure, to statistical assumptions about the observations, thus providing foundational principles for unsupervised DEQs. For hierarchical latents, individual neurons can be interpreted as nodes in a deep graphical model. Our DEQ feature maps are end-to-end differentiable, enabling fine-tuning for downstream tasks.

Cite this Paper


BibTeX
@InProceedings{pmlr-v206-tsuchida23a, title = {Deep equilibrium models as estimators for continuous latent variables}, author = {Tsuchida, Russell and Ong, Cheng Soon}, booktitle = {Proceedings of The 26th International Conference on Artificial Intelligence and Statistics}, pages = {1646--1671}, year = {2023}, editor = {Ruiz, Francisco and Dy, Jennifer and van de Meent, Jan-Willem}, volume = {206}, series = {Proceedings of Machine Learning Research}, month = {25--27 Apr}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v206/tsuchida23a/tsuchida23a.pdf}, url = {https://proceedings.mlr.press/v206/tsuchida23a.html}, abstract = {Principal Component Analysis (PCA) and its exponential family extensions have three components: observations, latents and parameters of a linear transformation. We consider a generalised setting where the canonical parameters of the exponential family are a nonlinear transformation of the latents. We show explicit relationships between particular neural network architectures and the corresponding statistical models. We find that deep equilibrium models — a recently introduced class of implicit neural networks — solve maximum a-posteriori (MAP) estimates for the latents and parameters of the transformation. Our analysis provides a systematic way to relate activation functions, dropout, and layer structure, to statistical assumptions about the observations, thus providing foundational principles for unsupervised DEQs. For hierarchical latents, individual neurons can be interpreted as nodes in a deep graphical model. Our DEQ feature maps are end-to-end differentiable, enabling fine-tuning for downstream tasks.} }
Endnote
%0 Conference Paper %T Deep equilibrium models as estimators for continuous latent variables %A Russell Tsuchida %A Cheng Soon Ong %B Proceedings of The 26th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2023 %E Francisco Ruiz %E Jennifer Dy %E Jan-Willem van de Meent %F pmlr-v206-tsuchida23a %I PMLR %P 1646--1671 %U https://proceedings.mlr.press/v206/tsuchida23a.html %V 206 %X Principal Component Analysis (PCA) and its exponential family extensions have three components: observations, latents and parameters of a linear transformation. We consider a generalised setting where the canonical parameters of the exponential family are a nonlinear transformation of the latents. We show explicit relationships between particular neural network architectures and the corresponding statistical models. We find that deep equilibrium models — a recently introduced class of implicit neural networks — solve maximum a-posteriori (MAP) estimates for the latents and parameters of the transformation. Our analysis provides a systematic way to relate activation functions, dropout, and layer structure, to statistical assumptions about the observations, thus providing foundational principles for unsupervised DEQs. For hierarchical latents, individual neurons can be interpreted as nodes in a deep graphical model. Our DEQ feature maps are end-to-end differentiable, enabling fine-tuning for downstream tasks.
APA
Tsuchida, R. & Ong, C.S.. (2023). Deep equilibrium models as estimators for continuous latent variables. Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 206:1646-1671 Available from https://proceedings.mlr.press/v206/tsuchida23a.html.

Related Material