[edit]
Length independent PAC-Bayes bounds for Simple RNNs
Proceedings of The 27th International Conference on Artificial Intelligence and Statistics, PMLR 238:3547-3555, 2024.
Abstract
While the practical interest of Recurrent neural networks (RNNs) is attested, much remains to be done to develop a thorough theoretical understanding of their abilities, particularly in what concerns their learning capacities. A powerful framework to tackle this question is the one of PAC-Bayes theory, which allows one to derive bounds providing guarantees on the expected performance of learning models on unseen data. In this paper, we provide an extensive study on the conditions leading to PAC-Bayes bounds for non-linear RNNs that are independent of the length of the data. The derivation of our results relies on a perturbation analysis on the weights of the network. We prove bounds that hold for \emph{$\beta$-saturated} and \emph{DS $\beta$-saturated} SRNs, classes of RNNs we introduce to formalize saturation regimes of RNNs. The first regime corresponds to the case where the values of the hidden state of the SRN are always close to the boundaries of the activation functions. The second one, closely related to practical observations, only requires that it happens at least once in each component of the hidden state on a sliding window of a given size.