When Representations Align: Universality in Representation Learning Dynamics

Loek Van Rossem, Andrew M Saxe
Proceedings of the 41st International Conference on Machine Learning, PMLR 235:49098-49121, 2024.

Abstract

Deep neural networks come in many sizes and architectures. The choice of architecture, in conjunction with the dataset and learning algorithm, is commonly understood to affect the learned neural representations. Yet, recent results have shown that different architectures learn representations with striking qualitative similarities. Here we derive an effective theory of representation learning under the assumption that the encoding map from input to hidden representation and the decoding map from representation to output are arbitrary smooth functions. This theory schematizes representation learning dynamics in the regime of complex, large architectures, where hidden representations are not strongly constrained by the parametrization. We show through experiments that the effective theory describes aspects of representation learning dynamics across a range of deep networks with different activation functions and architectures, and exhibits phenomena similar to the “rich” and “lazy” regime. While many network behaviors depend quantitatively on architecture, our findings point to certain behaviors that are widely conserved once models are sufficiently flexible.

Cite this Paper


BibTeX
@InProceedings{pmlr-v235-van-rossem24a, title = {When Representations Align: Universality in Representation Learning Dynamics}, author = {Van Rossem, Loek and Saxe, Andrew M}, booktitle = {Proceedings of the 41st International Conference on Machine Learning}, pages = {49098--49121}, year = {2024}, editor = {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix}, volume = {235}, series = {Proceedings of Machine Learning Research}, month = {21--27 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v235/main/assets/van-rossem24a/van-rossem24a.pdf}, url = {https://proceedings.mlr.press/v235/van-rossem24a.html}, abstract = {Deep neural networks come in many sizes and architectures. The choice of architecture, in conjunction with the dataset and learning algorithm, is commonly understood to affect the learned neural representations. Yet, recent results have shown that different architectures learn representations with striking qualitative similarities. Here we derive an effective theory of representation learning under the assumption that the encoding map from input to hidden representation and the decoding map from representation to output are arbitrary smooth functions. This theory schematizes representation learning dynamics in the regime of complex, large architectures, where hidden representations are not strongly constrained by the parametrization. We show through experiments that the effective theory describes aspects of representation learning dynamics across a range of deep networks with different activation functions and architectures, and exhibits phenomena similar to the “rich” and “lazy” regime. While many network behaviors depend quantitatively on architecture, our findings point to certain behaviors that are widely conserved once models are sufficiently flexible.} }
Endnote
%0 Conference Paper %T When Representations Align: Universality in Representation Learning Dynamics %A Loek Van Rossem %A Andrew M Saxe %B Proceedings of the 41st International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2024 %E Ruslan Salakhutdinov %E Zico Kolter %E Katherine Heller %E Adrian Weller %E Nuria Oliver %E Jonathan Scarlett %E Felix Berkenkamp %F pmlr-v235-van-rossem24a %I PMLR %P 49098--49121 %U https://proceedings.mlr.press/v235/van-rossem24a.html %V 235 %X Deep neural networks come in many sizes and architectures. The choice of architecture, in conjunction with the dataset and learning algorithm, is commonly understood to affect the learned neural representations. Yet, recent results have shown that different architectures learn representations with striking qualitative similarities. Here we derive an effective theory of representation learning under the assumption that the encoding map from input to hidden representation and the decoding map from representation to output are arbitrary smooth functions. This theory schematizes representation learning dynamics in the regime of complex, large architectures, where hidden representations are not strongly constrained by the parametrization. We show through experiments that the effective theory describes aspects of representation learning dynamics across a range of deep networks with different activation functions and architectures, and exhibits phenomena similar to the “rich” and “lazy” regime. While many network behaviors depend quantitatively on architecture, our findings point to certain behaviors that are widely conserved once models are sufficiently flexible.
APA
Van Rossem, L. & Saxe, A.M.. (2024). When Representations Align: Universality in Representation Learning Dynamics. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:49098-49121 Available from https://proceedings.mlr.press/v235/van-rossem24a.html.

Related Material