Bounds on the Generalization Error in Active Learning

Vincent Menden, Yahya Saleh, Armin Iske
Proceedings of the 6th Northern Lights Deep Learning Conference (NLDL), PMLR 265:168-175, 2025.

Abstract

We establish empirical risk minimization principles for active learning by deriving a family of upper bounds on the generalization error. Aligning with empirical observations, the bounds suggest that superior query algorithms can be obtained by combining both informativeness and representativeness query strategies, where the latter is assessed using integral probability metrics. To facilitate the use of these bounds in application, we systematically link diverse active learning scenarios, characterized by their loss functions and hypothesis classes to their corresponding upper bounds. Our results show that regularization techniques used to constraint the complexity of various hypothesis classes are sufficient conditions to ensure the validity of the bounds. The present work enables principled construction and empirical quality-evaluation of query algorithms in active learning.

Cite this Paper


BibTeX
@InProceedings{pmlr-v265-menden25a, title = {Bounds on the Generalization Error in Active Learning}, author = {Menden, Vincent and Saleh, Yahya and Iske, Armin}, booktitle = {Proceedings of the 6th Northern Lights Deep Learning Conference (NLDL)}, pages = {168--175}, year = {2025}, editor = {Lutchyn, Tetiana and Ramírez Rivera, Adín and Ricaud, Benjamin}, volume = {265}, series = {Proceedings of Machine Learning Research}, month = {07--09 Jan}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v265/main/assets/menden25a/menden25a.pdf}, url = {https://proceedings.mlr.press/v265/menden25a.html}, abstract = {We establish empirical risk minimization principles for active learning by deriving a family of upper bounds on the generalization error. Aligning with empirical observations, the bounds suggest that superior query algorithms can be obtained by combining both informativeness and representativeness query strategies, where the latter is assessed using integral probability metrics. To facilitate the use of these bounds in application, we systematically link diverse active learning scenarios, characterized by their loss functions and hypothesis classes to their corresponding upper bounds. Our results show that regularization techniques used to constraint the complexity of various hypothesis classes are sufficient conditions to ensure the validity of the bounds. The present work enables principled construction and empirical quality-evaluation of query algorithms in active learning.} }
Endnote
%0 Conference Paper %T Bounds on the Generalization Error in Active Learning %A Vincent Menden %A Yahya Saleh %A Armin Iske %B Proceedings of the 6th Northern Lights Deep Learning Conference (NLDL) %C Proceedings of Machine Learning Research %D 2025 %E Tetiana Lutchyn %E Adín Ramírez Rivera %E Benjamin Ricaud %F pmlr-v265-menden25a %I PMLR %P 168--175 %U https://proceedings.mlr.press/v265/menden25a.html %V 265 %X We establish empirical risk minimization principles for active learning by deriving a family of upper bounds on the generalization error. Aligning with empirical observations, the bounds suggest that superior query algorithms can be obtained by combining both informativeness and representativeness query strategies, where the latter is assessed using integral probability metrics. To facilitate the use of these bounds in application, we systematically link diverse active learning scenarios, characterized by their loss functions and hypothesis classes to their corresponding upper bounds. Our results show that regularization techniques used to constraint the complexity of various hypothesis classes are sufficient conditions to ensure the validity of the bounds. The present work enables principled construction and empirical quality-evaluation of query algorithms in active learning.
APA
Menden, V., Saleh, Y. & Iske, A.. (2025). Bounds on the Generalization Error in Active Learning. Proceedings of the 6th Northern Lights Deep Learning Conference (NLDL), in Proceedings of Machine Learning Research 265:168-175 Available from https://proceedings.mlr.press/v265/menden25a.html.

Related Material