Weak-to-Strong Generalization Even in Random Feature Networks, Provably

Marko Medvedev, Kaifeng Lyu, Dingli Yu, Sanjeev Arora, Zhiyuan Li, Nathan Srebro
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:43519-43556, 2025.

Abstract

Weak-to-Strong Generalization (Burns et al.,2024) is the phenomenon whereby a strong student, say GPT-4, learns a task from a weak teacher, say GPT-2, and ends up significantly outperforming the teacher. We show that this phenomenon does not require a complex and pretrained learner like GPT-4, can arise even in simple non-pretrained models, simply due to the size advantage of the student. But, we also show that there are inherint limits to the extent of such weak to strong generalization. We consider students and teachers that are random feature models, described by two-layer networks with a random and fixed bottom layer and trained top layer. A weak’ teacher, with a small number of units (i.e. random features), is trained on the population, and a strong’ student, with a much larger number of units (i.e. random features), is trained only on labels generated by the weak teacher. We demonstrate, prove, and understand how the student can outperform the teacher, even though trained only on data labeled by the teacher. We also explain how such weak-to-strong generalization is enabled by early stopping. We then show the quantitative limits of weak-to-strong generalization in this model, and in fact in a much broader class of models, for arbitrary teacher and student feature spaces and a broad class of learning rules, including when the student features are pre-trained or otherwise more informative. In particular, we show that in such models the student’s error can only approach zero if the teacher’s error approaches zero, and a strong student cannot “boost” a slightly-better-then-chance teacher to obtain a small error.

Cite this Paper


BibTeX
@InProceedings{pmlr-v267-medvedev25a, title = {Weak-to-Strong Generalization Even in Random Feature Networks, Provably}, author = {Medvedev, Marko and Lyu, Kaifeng and Yu, Dingli and Arora, Sanjeev and Li, Zhiyuan and Srebro, Nathan}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, pages = {43519--43556}, year = {2025}, editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry}, volume = {267}, series = {Proceedings of Machine Learning Research}, month = {13--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/medvedev25a/medvedev25a.pdf}, url = {https://proceedings.mlr.press/v267/medvedev25a.html}, abstract = {Weak-to-Strong Generalization (Burns et al.,2024) is the phenomenon whereby a strong student, say GPT-4, learns a task from a weak teacher, say GPT-2, and ends up significantly outperforming the teacher. We show that this phenomenon does not require a complex and pretrained learner like GPT-4, can arise even in simple non-pretrained models, simply due to the size advantage of the student. But, we also show that there are inherint limits to the extent of such weak to strong generalization. We consider students and teachers that are random feature models, described by two-layer networks with a random and fixed bottom layer and trained top layer. A weak’ teacher, with a small number of units (i.e. random features), is trained on the population, and a strong’ student, with a much larger number of units (i.e. random features), is trained only on labels generated by the weak teacher. We demonstrate, prove, and understand how the student can outperform the teacher, even though trained only on data labeled by the teacher. We also explain how such weak-to-strong generalization is enabled by early stopping. We then show the quantitative limits of weak-to-strong generalization in this model, and in fact in a much broader class of models, for arbitrary teacher and student feature spaces and a broad class of learning rules, including when the student features are pre-trained or otherwise more informative. In particular, we show that in such models the student’s error can only approach zero if the teacher’s error approaches zero, and a strong student cannot “boost” a slightly-better-then-chance teacher to obtain a small error.} }
Endnote
%0 Conference Paper %T Weak-to-Strong Generalization Even in Random Feature Networks, Provably %A Marko Medvedev %A Kaifeng Lyu %A Dingli Yu %A Sanjeev Arora %A Zhiyuan Li %A Nathan Srebro %B Proceedings of the 42nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Aarti Singh %E Maryam Fazel %E Daniel Hsu %E Simon Lacoste-Julien %E Felix Berkenkamp %E Tegan Maharaj %E Kiri Wagstaff %E Jerry Zhu %F pmlr-v267-medvedev25a %I PMLR %P 43519--43556 %U https://proceedings.mlr.press/v267/medvedev25a.html %V 267 %X Weak-to-Strong Generalization (Burns et al.,2024) is the phenomenon whereby a strong student, say GPT-4, learns a task from a weak teacher, say GPT-2, and ends up significantly outperforming the teacher. We show that this phenomenon does not require a complex and pretrained learner like GPT-4, can arise even in simple non-pretrained models, simply due to the size advantage of the student. But, we also show that there are inherint limits to the extent of such weak to strong generalization. We consider students and teachers that are random feature models, described by two-layer networks with a random and fixed bottom layer and trained top layer. A weak’ teacher, with a small number of units (i.e. random features), is trained on the population, and a strong’ student, with a much larger number of units (i.e. random features), is trained only on labels generated by the weak teacher. We demonstrate, prove, and understand how the student can outperform the teacher, even though trained only on data labeled by the teacher. We also explain how such weak-to-strong generalization is enabled by early stopping. We then show the quantitative limits of weak-to-strong generalization in this model, and in fact in a much broader class of models, for arbitrary teacher and student feature spaces and a broad class of learning rules, including when the student features are pre-trained or otherwise more informative. In particular, we show that in such models the student’s error can only approach zero if the teacher’s error approaches zero, and a strong student cannot “boost” a slightly-better-then-chance teacher to obtain a small error.
APA
Medvedev, M., Lyu, K., Yu, D., Arora, S., Li, Z. & Srebro, N.. (2025). Weak-to-Strong Generalization Even in Random Feature Networks, Provably. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:43519-43556 Available from https://proceedings.mlr.press/v267/medvedev25a.html.

Related Material