Global optimality of Elman-type RNNs in the mean-field regime

Andrea Agazzi; Jianfeng Lu; Sayan Mukherjee

Global optimality of Elman-type RNNs in the mean-field regime

Andrea Agazzi, Jianfeng Lu, Sayan Mukherjee

Proceedings of the 40th International Conference on Machine Learning, PMLR 202:196-227, 2023.

Abstract

We analyze Elman-type recurrent neural networks (RNNs) and their training in the mean-field regime. Specifically, we show convergence of gradient descent training dynamics of the RNN to the corresponding mean-field formulation in the large width limit. We also show that the fixed points of the limiting infinite-width dynamics are globally optimal, under some assumptions on the initialization of the weights. Our results establish optimality for feature-learning with wide RNNs in the mean-field regime.

Cite this Paper

BibTeX


@InProceedings{pmlr-v202-agazzi23a,
  title = 	 {Global optimality of Elman-type {RNN}s in the mean-field regime},
  author =       {Agazzi, Andrea and Lu, Jianfeng and Mukherjee, Sayan},
  booktitle = 	 {Proceedings of the 40th International Conference on Machine Learning},
  pages = 	 {196--227},
  year = 	 {2023},
  editor = 	 {Krause, Andreas and Brunskill, Emma and Cho, Kyunghyun and Engelhardt, Barbara and Sabato, Sivan and Scarlett, Jonathan},
  volume = 	 {202},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {23--29 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v202/agazzi23a/agazzi23a.pdf},
  url = 	 {https://proceedings.mlr.press/v202/agazzi23a.html},
  abstract = 	 {We analyze Elman-type recurrent neural networks (RNNs) and their training in the mean-field regime. Specifically, we show convergence of gradient descent training dynamics of the RNN to the corresponding mean-field formulation in the large width limit. We also show that the fixed points of the limiting infinite-width dynamics are globally optimal, under some assumptions on the initialization of the weights. Our results establish optimality for feature-learning with wide RNNs in the mean-field regime.}
}

Endnote

%0 Conference Paper
%T Global optimality of Elman-type RNNs in the mean-field regime
%A Andrea Agazzi
%A Jianfeng Lu
%A Sayan Mukherjee
%B Proceedings of the 40th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2023
%E Andreas Krause
%E Emma Brunskill
%E Kyunghyun Cho
%E Barbara Engelhardt
%E Sivan Sabato
%E Jonathan Scarlett	
%F pmlr-v202-agazzi23a
%I PMLR
%P 196--227
%U https://proceedings.mlr.press/v202/agazzi23a.html
%V 202
%X We analyze Elman-type recurrent neural networks (RNNs) and their training in the mean-field regime. Specifically, we show convergence of gradient descent training dynamics of the RNN to the corresponding mean-field formulation in the large width limit. We also show that the fixed points of the limiting infinite-width dynamics are globally optimal, under some assumptions on the initialization of the weights. Our results establish optimality for feature-learning with wide RNNs in the mean-field regime.

APA


Agazzi, A., Lu, J. & Mukherjee, S.. (2023). Global optimality of Elman-type RNNs in the mean-field regime. Proceedings of the 40th International Conference on Machine Learning, in Proceedings of Machine Learning Research 202:196-227 Available from https://proceedings.mlr.press/v202/agazzi23a.html.

Global optimality of Elman-type RNNs in the mean-field regime

Abstract

Cite this Paper

Related Material