[edit]
Using Sliced Mutual Information to Study Memorization and Generalization in Deep Neural Networks
Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, PMLR 206:11608-11629, 2023.
Abstract
In this paper, we study the memorization and generalization behaviour of deep neural networks (DNNs) using sliced mutual information (SMI), which is the average of the mutual information (MI) between one-dimensional random projections. We argue that the SMI between features in a DNN ($T$) and ground truth labels ($Y$), $SI(T;Y)$, can be seen as a form of usable information that the features contain about the labels. We show theoretically that $SI(T;Y)$ can encode geometric properties of the feature distribution, such as its spherical soft-margin and intrinsic dimensionality, in a way that MI cannot. Additionally, we present empirical evidence showing how $SI(T;Y)$ can capture memorization and generalization in DNNs. In particular, we find that, in the presence of label noise, all layers start to memorize but the earlier layers stabilize more quickly than the deeper layers. Finally, we point out that, in the context of Bayesian Neural Networks, the SMI between the penultimate layer and the output represents the worst case uncertainty of the network’s output.