A New Concentration Inequality for Sampling Without Replacement and Its Application for Transductive Learning

Yingzhen Yang
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:70472-70497, 2025.

Abstract

We introduce a new tool, Transductive Local Complexity (TLC), to analyze the generalization performance of transductive learning methods and motivate new transductive learning algorithms. Our work extends the idea of the popular Local Rademacher Complexity (LRC) to the transductive setting with considerable and novel changes compared to the analysis of typical LRC methods in the inductive setting. While LRC has been widely used as a powerful tool in the analysis of inductive models with sharp generalization bounds for classification and minimax rates for nonparametric regression, it remains an open problem whether a localized version of Rademacher complexity based tool can be designed and applied to transductive learning and gain sharp bound for transductive learning which is consistent with the inductive excess risk bound by (LRC). We give a confirmative answer to this open problem by TLC. Similar to the development of LRC, we build TLC by first establishing a novel and sharp concentration inequality for supremum of empirical processes for the gap between test and training loss in the setting of sampling uniformly without replacement. Then a peeling strategy and a new surrogate variance operator are used to derive the following excess risk bound in the transductive setting, which is consistent with that of the classical LRC based excess risk bound in the inductive setting. As an application of TLC, we use the new TLC tool to analyze the Transductive Kernel Learning (TKL) model, and derive sharper excess risk bound than that by the current state-of-the-art. As a result of independent interest, the concentration inequality for the test-train process is used to derive a sharp concentration inequality for the general supremum of empirical process involving random variables in the setting of sampling uniformly without replacement, with comparison to current concentration inequalities.

Cite this Paper


BibTeX
@InProceedings{pmlr-v267-yang25a, title = {A New Concentration Inequality for Sampling Without Replacement and Its Application for Transductive Learning}, author = {Yang, Yingzhen}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, pages = {70472--70497}, year = {2025}, editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry}, volume = {267}, series = {Proceedings of Machine Learning Research}, month = {13--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/yang25a/yang25a.pdf}, url = {https://proceedings.mlr.press/v267/yang25a.html}, abstract = {We introduce a new tool, Transductive Local Complexity (TLC), to analyze the generalization performance of transductive learning methods and motivate new transductive learning algorithms. Our work extends the idea of the popular Local Rademacher Complexity (LRC) to the transductive setting with considerable and novel changes compared to the analysis of typical LRC methods in the inductive setting. While LRC has been widely used as a powerful tool in the analysis of inductive models with sharp generalization bounds for classification and minimax rates for nonparametric regression, it remains an open problem whether a localized version of Rademacher complexity based tool can be designed and applied to transductive learning and gain sharp bound for transductive learning which is consistent with the inductive excess risk bound by (LRC). We give a confirmative answer to this open problem by TLC. Similar to the development of LRC, we build TLC by first establishing a novel and sharp concentration inequality for supremum of empirical processes for the gap between test and training loss in the setting of sampling uniformly without replacement. Then a peeling strategy and a new surrogate variance operator are used to derive the following excess risk bound in the transductive setting, which is consistent with that of the classical LRC based excess risk bound in the inductive setting. As an application of TLC, we use the new TLC tool to analyze the Transductive Kernel Learning (TKL) model, and derive sharper excess risk bound than that by the current state-of-the-art. As a result of independent interest, the concentration inequality for the test-train process is used to derive a sharp concentration inequality for the general supremum of empirical process involving random variables in the setting of sampling uniformly without replacement, with comparison to current concentration inequalities.} }
Endnote
%0 Conference Paper %T A New Concentration Inequality for Sampling Without Replacement and Its Application for Transductive Learning %A Yingzhen Yang %B Proceedings of the 42nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Aarti Singh %E Maryam Fazel %E Daniel Hsu %E Simon Lacoste-Julien %E Felix Berkenkamp %E Tegan Maharaj %E Kiri Wagstaff %E Jerry Zhu %F pmlr-v267-yang25a %I PMLR %P 70472--70497 %U https://proceedings.mlr.press/v267/yang25a.html %V 267 %X We introduce a new tool, Transductive Local Complexity (TLC), to analyze the generalization performance of transductive learning methods and motivate new transductive learning algorithms. Our work extends the idea of the popular Local Rademacher Complexity (LRC) to the transductive setting with considerable and novel changes compared to the analysis of typical LRC methods in the inductive setting. While LRC has been widely used as a powerful tool in the analysis of inductive models with sharp generalization bounds for classification and minimax rates for nonparametric regression, it remains an open problem whether a localized version of Rademacher complexity based tool can be designed and applied to transductive learning and gain sharp bound for transductive learning which is consistent with the inductive excess risk bound by (LRC). We give a confirmative answer to this open problem by TLC. Similar to the development of LRC, we build TLC by first establishing a novel and sharp concentration inequality for supremum of empirical processes for the gap between test and training loss in the setting of sampling uniformly without replacement. Then a peeling strategy and a new surrogate variance operator are used to derive the following excess risk bound in the transductive setting, which is consistent with that of the classical LRC based excess risk bound in the inductive setting. As an application of TLC, we use the new TLC tool to analyze the Transductive Kernel Learning (TKL) model, and derive sharper excess risk bound than that by the current state-of-the-art. As a result of independent interest, the concentration inequality for the test-train process is used to derive a sharp concentration inequality for the general supremum of empirical process involving random variables in the setting of sampling uniformly without replacement, with comparison to current concentration inequalities.
APA
Yang, Y.. (2025). A New Concentration Inequality for Sampling Without Replacement and Its Application for Transductive Learning. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:70472-70497 Available from https://proceedings.mlr.press/v267/yang25a.html.

Related Material