On the Asymptotics of Self-Supervised Pre-training: Two-Stage M-Estimation and Representation Symmetry

Mohammad Tinati, Stephen Tu
Proceedings of Thirty Ninth Conference on Learning Theory, PMLR 336:6197-6309, 2026.

Abstract

Self-supervised pre-training, where large corpora of unlabeled data are used to learn representations for downstream fine-tuning, has become a cornerstone of modern machine learning. While a growing body of work has begun to analyze this paradigm, existing bounds leave open the question of how sharp current rates are, and whether they accurately capture the complex interaction between pre-training and fine-tuning. In this paper, we address this gap by developing an asymptotic theory of pre-training via two-stage $M$-estimation. A key challenge is that the pre-training estimator is often identifiable only up to a group symmetry, a feature common in representation learning that requires careful treatment. We address this issue using tools from Riemannian geometry to study the \emph{intrinsic} parameters of the pre-training representation, which we link with the downstream predictor through a notion of \emph{orbit-invariance}, precisely characterizing the limiting distribution of the downstream test risk. We apply our results to spectral pre-training, factor models, and Gaussian mixture models, obtaining substantial improvements in problem-specific factors over prior art when applicable.

Cite this Paper


BibTeX
@InProceedings{pmlr-v336-tinati26a, title = {On the Asymptotics of Self-Supervised Pre-training: Two-Stage M-Estimation and Representation Symmetry}, author = {Tinati, Mohammad and Tu, Stephen}, booktitle = {Proceedings of Thirty Ninth Conference on Learning Theory}, pages = {6197--6309}, year = {2026}, editor = {Hanneke, Steve and Lattimore, Tor}, volume = {336}, series = {Proceedings of Machine Learning Research}, month = {29 Jun--03 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v336/main/assets/tinati26a/tinati26a.pdf}, url = {https://proceedings.mlr.press/v336/tinati26a.html}, abstract = {Self-supervised pre-training, where large corpora of unlabeled data are used to learn representations for downstream fine-tuning, has become a cornerstone of modern machine learning. While a growing body of work has begun to analyze this paradigm, existing bounds leave open the question of how sharp current rates are, and whether they accurately capture the complex interaction between pre-training and fine-tuning. In this paper, we address this gap by developing an asymptotic theory of pre-training via two-stage $M$-estimation. A key challenge is that the pre-training estimator is often identifiable only up to a group symmetry, a feature common in representation learning that requires careful treatment. We address this issue using tools from Riemannian geometry to study the \emph{intrinsic} parameters of the pre-training representation, which we link with the downstream predictor through a notion of \emph{orbit-invariance}, precisely characterizing the limiting distribution of the downstream test risk. We apply our results to spectral pre-training, factor models, and Gaussian mixture models, obtaining substantial improvements in problem-specific factors over prior art when applicable.} }
Endnote
%0 Conference Paper %T On the Asymptotics of Self-Supervised Pre-training: Two-Stage M-Estimation and Representation Symmetry %A Mohammad Tinati %A Stephen Tu %B Proceedings of Thirty Ninth Conference on Learning Theory %C Proceedings of Machine Learning Research %D 2026 %E Steve Hanneke %E Tor Lattimore %F pmlr-v336-tinati26a %I PMLR %P 6197--6309 %U https://proceedings.mlr.press/v336/tinati26a.html %V 336 %X Self-supervised pre-training, where large corpora of unlabeled data are used to learn representations for downstream fine-tuning, has become a cornerstone of modern machine learning. While a growing body of work has begun to analyze this paradigm, existing bounds leave open the question of how sharp current rates are, and whether they accurately capture the complex interaction between pre-training and fine-tuning. In this paper, we address this gap by developing an asymptotic theory of pre-training via two-stage $M$-estimation. A key challenge is that the pre-training estimator is often identifiable only up to a group symmetry, a feature common in representation learning that requires careful treatment. We address this issue using tools from Riemannian geometry to study the \emph{intrinsic} parameters of the pre-training representation, which we link with the downstream predictor through a notion of \emph{orbit-invariance}, precisely characterizing the limiting distribution of the downstream test risk. We apply our results to spectral pre-training, factor models, and Gaussian mixture models, obtaining substantial improvements in problem-specific factors over prior art when applicable.
APA
Tinati, M. & Tu, S.. (2026). On the Asymptotics of Self-Supervised Pre-training: Two-Stage M-Estimation and Representation Symmetry. Proceedings of Thirty Ninth Conference on Learning Theory, in Proceedings of Machine Learning Research 336:6197-6309 Available from https://proceedings.mlr.press/v336/tinati26a.html.

Related Material