Global optimality of Elman-type RNNs in the mean-field regime

Andrea Agazzi, Jianfeng Lu, Sayan Mukherjee
Proceedings of the 40th International Conference on Machine Learning, PMLR 202:196-227, 2023.

Abstract

We analyze Elman-type recurrent neural networks (RNNs) and their training in the mean-field regime. Specifically, we show convergence of gradient descent training dynamics of the RNN to the corresponding mean-field formulation in the large width limit. We also show that the fixed points of the limiting infinite-width dynamics are globally optimal, under some assumptions on the initialization of the weights. Our results establish optimality for feature-learning with wide RNNs in the mean-field regime.

Cite this Paper


BibTeX
@InProceedings{pmlr-v202-agazzi23a, title = {Global optimality of Elman-type {RNN}s in the mean-field regime}, author = {Agazzi, Andrea and Lu, Jianfeng and Mukherjee, Sayan}, booktitle = {Proceedings of the 40th International Conference on Machine Learning}, pages = {196--227}, year = {2023}, editor = {Krause, Andreas and Brunskill, Emma and Cho, Kyunghyun and Engelhardt, Barbara and Sabato, Sivan and Scarlett, Jonathan}, volume = {202}, series = {Proceedings of Machine Learning Research}, month = {23--29 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v202/agazzi23a/agazzi23a.pdf}, url = {https://proceedings.mlr.press/v202/agazzi23a.html}, abstract = {We analyze Elman-type recurrent neural networks (RNNs) and their training in the mean-field regime. Specifically, we show convergence of gradient descent training dynamics of the RNN to the corresponding mean-field formulation in the large width limit. We also show that the fixed points of the limiting infinite-width dynamics are globally optimal, under some assumptions on the initialization of the weights. Our results establish optimality for feature-learning with wide RNNs in the mean-field regime.} }
Endnote
%0 Conference Paper %T Global optimality of Elman-type RNNs in the mean-field regime %A Andrea Agazzi %A Jianfeng Lu %A Sayan Mukherjee %B Proceedings of the 40th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2023 %E Andreas Krause %E Emma Brunskill %E Kyunghyun Cho %E Barbara Engelhardt %E Sivan Sabato %E Jonathan Scarlett %F pmlr-v202-agazzi23a %I PMLR %P 196--227 %U https://proceedings.mlr.press/v202/agazzi23a.html %V 202 %X We analyze Elman-type recurrent neural networks (RNNs) and their training in the mean-field regime. Specifically, we show convergence of gradient descent training dynamics of the RNN to the corresponding mean-field formulation in the large width limit. We also show that the fixed points of the limiting infinite-width dynamics are globally optimal, under some assumptions on the initialization of the weights. Our results establish optimality for feature-learning with wide RNNs in the mean-field regime.
APA
Agazzi, A., Lu, J. & Mukherjee, S.. (2023). Global optimality of Elman-type RNNs in the mean-field regime. Proceedings of the 40th International Conference on Machine Learning, in Proceedings of Machine Learning Research 202:196-227 Available from https://proceedings.mlr.press/v202/agazzi23a.html.

Related Material