[edit]
Thinking of Neural Networks Like a Physicist: The Statistical Physics of Machine Learning
Proceedings of the Analytical Connectionism Schools 2023--2024, PMLR 320:15-41, 2026.
Abstract
Machine learning (ML) enables us to uncover patterns from data and generalize this information to new, unseen examples. The rapid development of the field has transformed not only classical computer science domains—such as computer vision, natural language processing, and speech recognition—but has also begun to reshape scientific research more broadly, including psychology and neuroscience. This paper presents a pedagogical introduction to an emerging line of research that seeks to interpret ML systems by " thinking like a physicist” presented by Florent Krzakala at Analytical Connectionism 2023. In particular, the methods and intuition of statistical physics—which has a long history of studying complex systems—can be fruitfully applied to high-dimensional problems encountered in ML. First, the paper presents applications of statistical physics techniques to unsupervised machine learning, in which patterns are found in data without any supervisory signal. The replica method – an important approximation that allows computing the expected value of the logarithm of the problem’s likelihood ratio efficiently – greatly facilitates the analysis of classic supervised learning problems such as sparse signal denoising and clustering. The approximate message passing algorithm describes an iterative approach to solving these types of problems. Second, the paper turns to the supervised learning setting, in which ground-truth training labels are used to train a learning algorithm. It characterizes the learning dynamics of neural networks with a single hidden layer across two types of regimes. In the lazy learning regime, learning occurs only in the readout layer of the neural network with fixed embedding weights. With an infinitely wide hidden layer, this corresponds to the neural tangent kernel regime in which the network behaves linearly over its features, and can be used to characterize the possible solutions to the learning problem. Meanwhile, in the feature learning regime, learning occurs in all weights, including embeddings. The paper ends with a brief discussion of current research going beyond single-sample stochastic gradient descent, and a brief introduction to the applications of the concepts outlined in this paper to cognitive psychology.