Active Learning on a Budget: Opposite Strategies Suit High and Low Budgets

Guy Hacohen; Avihu Dekel; Daphna Weinshall

Active Learning on a Budget: Opposite Strategies Suit High and Low Budgets

Guy Hacohen, Avihu Dekel, Daphna Weinshall

Proceedings of the 39th International Conference on Machine Learning, PMLR 162:8175-8195, 2022.

Abstract

Investigating active learning, we focus on the relation between the number of labeled examples (budget size), and suitable querying strategies. Our theoretical analysis shows a behavior reminiscent of phase transition: typical examples are best queried when the budget is low, while unrepresentative examples are best queried when the budget is large. Combined evidence shows that a similar phenomenon occurs in common classification models. Accordingly, we propose TypiClust – a deep active learning strategy suited for low budgets. In a comparative empirical investigation of supervised learning, using a variety of architectures and image datasets, TypiClust outperforms all other active learning strategies in the low-budget regime. Using TypiClust in the semi-supervised framework, performance gets an even more significant boost. In particular, state-of-the-art semi-supervised methods trained on CIFAR-10 with 10 labeled examples selected by TypiClust, reach 93.2% accuracy – an improvement of 39.4% over random selection. Code is available at https://github.com/avihu111/TypiClust.

Cite this Paper

BibTeX


@InProceedings{pmlr-v162-hacohen22a,
  title = 	 {Active Learning on a Budget: Opposite Strategies Suit High and Low Budgets},
  author =       {Hacohen, Guy and Dekel, Avihu and Weinshall, Daphna},
  booktitle = 	 {Proceedings of the 39th International Conference on Machine Learning},
  pages = 	 {8175--8195},
  year = 	 {2022},
  editor = 	 {Chaudhuri, Kamalika and Jegelka, Stefanie and Song, Le and Szepesvari, Csaba and Niu, Gang and Sabato, Sivan},
  volume = 	 {162},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {17--23 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v162/hacohen22a/hacohen22a.pdf},
  url = 	 {https://proceedings.mlr.press/v162/hacohen22a.html},
  abstract = 	 {Investigating active learning, we focus on the relation between the number of labeled examples (budget size), and suitable querying strategies. Our theoretical analysis shows a behavior reminiscent of phase transition: typical examples are best queried when the budget is low, while unrepresentative examples are best queried when the budget is large. Combined evidence shows that a similar phenomenon occurs in common classification models. Accordingly, we propose TypiClust – a deep active learning strategy suited for low budgets. In a comparative empirical investigation of supervised learning, using a variety of architectures and image datasets, TypiClust outperforms all other active learning strategies in the low-budget regime. Using TypiClust in the semi-supervised framework, performance gets an even more significant boost. In particular, state-of-the-art semi-supervised methods trained on CIFAR-10 with 10 labeled examples selected by TypiClust, reach 93.2% accuracy – an improvement of 39.4% over random selection. Code is available at https://github.com/avihu111/TypiClust.}
}

Endnote

%0 Conference Paper
%T Active Learning on a Budget: Opposite Strategies Suit High and Low Budgets
%A Guy Hacohen
%A Avihu Dekel
%A Daphna Weinshall
%B Proceedings of the 39th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2022
%E Kamalika Chaudhuri
%E Stefanie Jegelka
%E Le Song
%E Csaba Szepesvari
%E Gang Niu
%E Sivan Sabato	
%F pmlr-v162-hacohen22a
%I PMLR
%P 8175--8195
%U https://proceedings.mlr.press/v162/hacohen22a.html
%V 162
%X Investigating active learning, we focus on the relation between the number of labeled examples (budget size), and suitable querying strategies. Our theoretical analysis shows a behavior reminiscent of phase transition: typical examples are best queried when the budget is low, while unrepresentative examples are best queried when the budget is large. Combined evidence shows that a similar phenomenon occurs in common classification models. Accordingly, we propose TypiClust – a deep active learning strategy suited for low budgets. In a comparative empirical investigation of supervised learning, using a variety of architectures and image datasets, TypiClust outperforms all other active learning strategies in the low-budget regime. Using TypiClust in the semi-supervised framework, performance gets an even more significant boost. In particular, state-of-the-art semi-supervised methods trained on CIFAR-10 with 10 labeled examples selected by TypiClust, reach 93.2% accuracy – an improvement of 39.4% over random selection. Code is available at https://github.com/avihu111/TypiClust.

APA


Hacohen, G., Dekel, A. & Weinshall, D.. (2022). Active Learning on a Budget: Opposite Strategies Suit High and Low Budgets. Proceedings of the 39th International Conference on Machine Learning, in Proceedings of Machine Learning Research 162:8175-8195 Available from https://proceedings.mlr.press/v162/hacohen22a.html.

Related Material

Download PDF