[edit]
The Effective Number of Shared Dimensions Between Paired Datasets
Proceedings of The 27th International Conference on Artificial Intelligence and Statistics, PMLR 238:4249-4257, 2024.
Abstract
A number of recent studies have sought to understand the behavior of both artificial and biological neural networks by comparing representations across layers, networks and brain areas. Increasingly prevalent, too, are comparisons across modalities of data, such as neural network activations and training data or behavioral data and neurophysiological recordings. One approach to such comparisons involves measuring the dimensionality of the space shared between the paired data matrices, where dimensionality serves as a proxy for computational or representational complexity. Established approaches, including CCA, can be used to measure the number of shared embedding dimensions, however they do not account for potentially unequal variance along shared dimensions and so cannot measure effective shared dimensionality. We present a candidate measure for shared dimensionality that we call the effective number of shared dimensions (ENSD). The ENSD is an interpretable and computationally efficient model-free measure of shared dimensionality that can be used to probe shared structure in a wide variety of data types. We demonstrate the relative robustness of the ENSD in cases where data is sparse or low rank and illustrate how the ENSD can be applied in a variety of analyses of representational similarities across layers in convolutional neural networks and between brain regions.