[edit]
On the Stepwise Nature of Self-Supervised Learning
Proceedings of the 40th International Conference on Machine Learning, PMLR 202:31852-31876, 2023.
Abstract
We present a simple picture of the training process of self-supervised learning methods with dual deep networks. In our picture, these methods learn their high-dimensional embeddings one dimension at a time in a sequence of discrete, well-separated steps. We arrive at this picture via the study of a linear toy model of Barlow Twins, applicable to the case in which the trained network is infinitely wide. We solve the training dynamics of our toy model from small initialization, finding that the model learns the top eigenmodes of a certain contrastive kernel in a discrete, stepwise fashion, and find a closed-form expression for the final learned representations. Remarkably, we see the same stepwise learning phenomenon when training deep ResNets using the Barlow Twins, SimCLR, and VICReg losses. This stepwise picture partially demystifies the process of self-supervised training.