PAC-Bayesian Analysis of the Surrogate Relation between Joint Embedding and Supervised Downstream Losses

Theresa Wasserer, Maximilian Fleissner, Debarghya Ghoshdastidar
Proceedings of The 37th International Conference on Algorithmic Learning Theory, PMLR 313:1-33, 2026.

Abstract

In recent years, self-supervised representation learning (SSL) has become an important learning paradigm and a crucial component of foundation models. SSL-based training pipelines are typically formalized as a sequence of two tasks—a pretext task that learns representations from large amounts of augmented unlabeled data, and a downstream task, where a simple model is fit on the learned representations with the help of little labeled data. The strong empirical performance of SSL-based pipelines for prominent joint embedding loss functions is not yet well explained in theory due to two main reasons: a lack of non-vacuous generalization bounds for the models learned in the pretext task, and a lack of practically computable transfer bounds that describe how generalization bounds derived for the pretext task transfer to the downstream task. In this work, we first derive non-vacuous PAC Bayesian generalization bounds for models optimized in the pretext task with prominent joint embedding SSL loss functions (VICReg, Barlow Twins, and Spectral Contrastive loss), accounting for their non-i.i.d. nature. Next, we provide the first practically computable transfer bounds for our considered loss functions by formally proving a surrogate relation that upper bounds the downstream squared L2 loss by the SSL pretext loss and a more accurate measure for the influence of the chosen augmentations than in previous work. In addition, our theoretical analysis identifies effective hyperparameter choices, thereby reducing the need for extensive hyperparameter tuning and offering principled guidance for model selection. We empirically validate our theoretical findings on CIFAR-10 and MNIST datasets.

Cite this Paper


BibTeX
@InProceedings{pmlr-v313-wasserer26a, title = {PAC-Bayesian Analysis of the Surrogate Relation between Joint Embedding and Supervised Downstream Losses}, author = {Wasserer, Theresa and Fleissner, Maximilian and Ghoshdastidar, Debarghya}, booktitle = {Proceedings of The 37th International Conference on Algorithmic Learning Theory}, pages = {1--33}, year = {2026}, editor = {Telgarsky, Matus and Ullman, Jonathan}, volume = {313}, series = {Proceedings of Machine Learning Research}, month = {23--26 Feb}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v313/main/assets/wasserer26a/wasserer26a.pdf}, url = {https://proceedings.mlr.press/v313/wasserer26a.html}, abstract = {In recent years, self-supervised representation learning (SSL) has become an important learning paradigm and a crucial component of foundation models. SSL-based training pipelines are typically formalized as a sequence of two tasks—a pretext task that learns representations from large amounts of augmented unlabeled data, and a downstream task, where a simple model is fit on the learned representations with the help of little labeled data. The strong empirical performance of SSL-based pipelines for prominent joint embedding loss functions is not yet well explained in theory due to two main reasons: a lack of non-vacuous generalization bounds for the models learned in the pretext task, and a lack of practically computable transfer bounds that describe how generalization bounds derived for the pretext task transfer to the downstream task. In this work, we first derive non-vacuous PAC Bayesian generalization bounds for models optimized in the pretext task with prominent joint embedding SSL loss functions (VICReg, Barlow Twins, and Spectral Contrastive loss), accounting for their non-i.i.d. nature. Next, we provide the first practically computable transfer bounds for our considered loss functions by formally proving a surrogate relation that upper bounds the downstream squared L2 loss by the SSL pretext loss and a more accurate measure for the influence of the chosen augmentations than in previous work. In addition, our theoretical analysis identifies effective hyperparameter choices, thereby reducing the need for extensive hyperparameter tuning and offering principled guidance for model selection. We empirically validate our theoretical findings on CIFAR-10 and MNIST datasets.} }
Endnote
%0 Conference Paper %T PAC-Bayesian Analysis of the Surrogate Relation between Joint Embedding and Supervised Downstream Losses %A Theresa Wasserer %A Maximilian Fleissner %A Debarghya Ghoshdastidar %B Proceedings of The 37th International Conference on Algorithmic Learning Theory %C Proceedings of Machine Learning Research %D 2026 %E Matus Telgarsky %E Jonathan Ullman %F pmlr-v313-wasserer26a %I PMLR %P 1--33 %U https://proceedings.mlr.press/v313/wasserer26a.html %V 313 %X In recent years, self-supervised representation learning (SSL) has become an important learning paradigm and a crucial component of foundation models. SSL-based training pipelines are typically formalized as a sequence of two tasks—a pretext task that learns representations from large amounts of augmented unlabeled data, and a downstream task, where a simple model is fit on the learned representations with the help of little labeled data. The strong empirical performance of SSL-based pipelines for prominent joint embedding loss functions is not yet well explained in theory due to two main reasons: a lack of non-vacuous generalization bounds for the models learned in the pretext task, and a lack of practically computable transfer bounds that describe how generalization bounds derived for the pretext task transfer to the downstream task. In this work, we first derive non-vacuous PAC Bayesian generalization bounds for models optimized in the pretext task with prominent joint embedding SSL loss functions (VICReg, Barlow Twins, and Spectral Contrastive loss), accounting for their non-i.i.d. nature. Next, we provide the first practically computable transfer bounds for our considered loss functions by formally proving a surrogate relation that upper bounds the downstream squared L2 loss by the SSL pretext loss and a more accurate measure for the influence of the chosen augmentations than in previous work. In addition, our theoretical analysis identifies effective hyperparameter choices, thereby reducing the need for extensive hyperparameter tuning and offering principled guidance for model selection. We empirically validate our theoretical findings on CIFAR-10 and MNIST datasets.
APA
Wasserer, T., Fleissner, M. & Ghoshdastidar, D.. (2026). PAC-Bayesian Analysis of the Surrogate Relation between Joint Embedding and Supervised Downstream Losses. Proceedings of The 37th International Conference on Algorithmic Learning Theory, in Proceedings of Machine Learning Research 313:1-33 Available from https://proceedings.mlr.press/v313/wasserer26a.html.

Related Material