Length independent PAC-Bayes bounds for Simple RNNs

Volodimir Mitarchuk, Clara Lacroce, Rémi Eyraud, Rémi Emonet, Amaury Habrard, Guillaume Rabusseau
Proceedings of The 27th International Conference on Artificial Intelligence and Statistics, PMLR 238:3547-3555, 2024.

Abstract

While the practical interest of Recurrent neural networks (RNNs) is attested, much remains to be done to develop a thorough theoretical understanding of their abilities, particularly in what concerns their learning capacities. A powerful framework to tackle this question is the one of PAC-Bayes theory, which allows one to derive bounds providing guarantees on the expected performance of learning models on unseen data. In this paper, we provide an extensive study on the conditions leading to PAC-Bayes bounds for non-linear RNNs that are independent of the length of the data. The derivation of our results relies on a perturbation analysis on the weights of the network. We prove bounds that hold for \emph{$\beta$-saturated} and \emph{DS $\beta$-saturated} SRNs, classes of RNNs we introduce to formalize saturation regimes of RNNs. The first regime corresponds to the case where the values of the hidden state of the SRN are always close to the boundaries of the activation functions. The second one, closely related to practical observations, only requires that it happens at least once in each component of the hidden state on a sliding window of a given size.

Cite this Paper


BibTeX
@InProceedings{pmlr-v238-mitarchuk24a, title = {Length independent {PAC}-{B}ayes bounds for Simple {RNNs}}, author = {Mitarchuk, Volodimir and Lacroce, Clara and Eyraud, R\'{e}mi and Emonet, R\'{e}mi and Habrard, Amaury and Rabusseau, Guillaume}, booktitle = {Proceedings of The 27th International Conference on Artificial Intelligence and Statistics}, pages = {3547--3555}, year = {2024}, editor = {Dasgupta, Sanjoy and Mandt, Stephan and Li, Yingzhen}, volume = {238}, series = {Proceedings of Machine Learning Research}, month = {02--04 May}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v238/mitarchuk24a/mitarchuk24a.pdf}, url = {https://proceedings.mlr.press/v238/mitarchuk24a.html}, abstract = {While the practical interest of Recurrent neural networks (RNNs) is attested, much remains to be done to develop a thorough theoretical understanding of their abilities, particularly in what concerns their learning capacities. A powerful framework to tackle this question is the one of PAC-Bayes theory, which allows one to derive bounds providing guarantees on the expected performance of learning models on unseen data. In this paper, we provide an extensive study on the conditions leading to PAC-Bayes bounds for non-linear RNNs that are independent of the length of the data. The derivation of our results relies on a perturbation analysis on the weights of the network. We prove bounds that hold for \emph{$\beta$-saturated} and \emph{DS $\beta$-saturated} SRNs, classes of RNNs we introduce to formalize saturation regimes of RNNs. The first regime corresponds to the case where the values of the hidden state of the SRN are always close to the boundaries of the activation functions. The second one, closely related to practical observations, only requires that it happens at least once in each component of the hidden state on a sliding window of a given size.} }
Endnote
%0 Conference Paper %T Length independent PAC-Bayes bounds for Simple RNNs %A Volodimir Mitarchuk %A Clara Lacroce %A Rémi Eyraud %A Rémi Emonet %A Amaury Habrard %A Guillaume Rabusseau %B Proceedings of The 27th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2024 %E Sanjoy Dasgupta %E Stephan Mandt %E Yingzhen Li %F pmlr-v238-mitarchuk24a %I PMLR %P 3547--3555 %U https://proceedings.mlr.press/v238/mitarchuk24a.html %V 238 %X While the practical interest of Recurrent neural networks (RNNs) is attested, much remains to be done to develop a thorough theoretical understanding of their abilities, particularly in what concerns their learning capacities. A powerful framework to tackle this question is the one of PAC-Bayes theory, which allows one to derive bounds providing guarantees on the expected performance of learning models on unseen data. In this paper, we provide an extensive study on the conditions leading to PAC-Bayes bounds for non-linear RNNs that are independent of the length of the data. The derivation of our results relies on a perturbation analysis on the weights of the network. We prove bounds that hold for \emph{$\beta$-saturated} and \emph{DS $\beta$-saturated} SRNs, classes of RNNs we introduce to formalize saturation regimes of RNNs. The first regime corresponds to the case where the values of the hidden state of the SRN are always close to the boundaries of the activation functions. The second one, closely related to practical observations, only requires that it happens at least once in each component of the hidden state on a sliding window of a given size.
APA
Mitarchuk, V., Lacroce, C., Eyraud, R., Emonet, R., Habrard, A. & Rabusseau, G.. (2024). Length independent PAC-Bayes bounds for Simple RNNs. Proceedings of The 27th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 238:3547-3555 Available from https://proceedings.mlr.press/v238/mitarchuk24a.html.

Related Material