Active Learning on a Budget: Opposite Strategies Suit High and Low Budgets

Guy Hacohen, Avihu Dekel, Daphna Weinshall
Proceedings of the 39th International Conference on Machine Learning, PMLR 162:8175-8195, 2022.

Abstract

Investigating active learning, we focus on the relation between the number of labeled examples (budget size), and suitable querying strategies. Our theoretical analysis shows a behavior reminiscent of phase transition: typical examples are best queried when the budget is low, while unrepresentative examples are best queried when the budget is large. Combined evidence shows that a similar phenomenon occurs in common classification models. Accordingly, we propose TypiClust – a deep active learning strategy suited for low budgets. In a comparative empirical investigation of supervised learning, using a variety of architectures and image datasets, TypiClust outperforms all other active learning strategies in the low-budget regime. Using TypiClust in the semi-supervised framework, performance gets an even more significant boost. In particular, state-of-the-art semi-supervised methods trained on CIFAR-10 with 10 labeled examples selected by TypiClust, reach 93.2% accuracy – an improvement of 39.4% over random selection. Code is available at https://github.com/avihu111/TypiClust.

Cite this Paper


BibTeX
@InProceedings{pmlr-v162-hacohen22a, title = {Active Learning on a Budget: Opposite Strategies Suit High and Low Budgets}, author = {Hacohen, Guy and Dekel, Avihu and Weinshall, Daphna}, booktitle = {Proceedings of the 39th International Conference on Machine Learning}, pages = {8175--8195}, year = {2022}, editor = {Chaudhuri, Kamalika and Jegelka, Stefanie and Song, Le and Szepesvari, Csaba and Niu, Gang and Sabato, Sivan}, volume = {162}, series = {Proceedings of Machine Learning Research}, month = {17--23 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v162/hacohen22a/hacohen22a.pdf}, url = {https://proceedings.mlr.press/v162/hacohen22a.html}, abstract = {Investigating active learning, we focus on the relation between the number of labeled examples (budget size), and suitable querying strategies. Our theoretical analysis shows a behavior reminiscent of phase transition: typical examples are best queried when the budget is low, while unrepresentative examples are best queried when the budget is large. Combined evidence shows that a similar phenomenon occurs in common classification models. Accordingly, we propose TypiClust – a deep active learning strategy suited for low budgets. In a comparative empirical investigation of supervised learning, using a variety of architectures and image datasets, TypiClust outperforms all other active learning strategies in the low-budget regime. Using TypiClust in the semi-supervised framework, performance gets an even more significant boost. In particular, state-of-the-art semi-supervised methods trained on CIFAR-10 with 10 labeled examples selected by TypiClust, reach 93.2% accuracy – an improvement of 39.4% over random selection. Code is available at https://github.com/avihu111/TypiClust.} }
Endnote
%0 Conference Paper %T Active Learning on a Budget: Opposite Strategies Suit High and Low Budgets %A Guy Hacohen %A Avihu Dekel %A Daphna Weinshall %B Proceedings of the 39th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2022 %E Kamalika Chaudhuri %E Stefanie Jegelka %E Le Song %E Csaba Szepesvari %E Gang Niu %E Sivan Sabato %F pmlr-v162-hacohen22a %I PMLR %P 8175--8195 %U https://proceedings.mlr.press/v162/hacohen22a.html %V 162 %X Investigating active learning, we focus on the relation between the number of labeled examples (budget size), and suitable querying strategies. Our theoretical analysis shows a behavior reminiscent of phase transition: typical examples are best queried when the budget is low, while unrepresentative examples are best queried when the budget is large. Combined evidence shows that a similar phenomenon occurs in common classification models. Accordingly, we propose TypiClust – a deep active learning strategy suited for low budgets. In a comparative empirical investigation of supervised learning, using a variety of architectures and image datasets, TypiClust outperforms all other active learning strategies in the low-budget regime. Using TypiClust in the semi-supervised framework, performance gets an even more significant boost. In particular, state-of-the-art semi-supervised methods trained on CIFAR-10 with 10 labeled examples selected by TypiClust, reach 93.2% accuracy – an improvement of 39.4% over random selection. Code is available at https://github.com/avihu111/TypiClust.
APA
Hacohen, G., Dekel, A. & Weinshall, D.. (2022). Active Learning on a Budget: Opposite Strategies Suit High and Low Budgets. Proceedings of the 39th International Conference on Machine Learning, in Proceedings of Machine Learning Research 162:8175-8195 Available from https://proceedings.mlr.press/v162/hacohen22a.html.

Related Material