GALAXY: Graph-based Active Learning at the Extreme

Jifan Zhang, Julian Katz-Samuels, Robert Nowak
Proceedings of the 39th International Conference on Machine Learning, PMLR 162:26223-26238, 2022.

Abstract

Active learning is a label-efficient approach to train highly effective models while interactively selecting only small subsets of unlabelled data for labelling and training. In “open world" settings, the classes of interest can make up a small fraction of the overall dataset – most of the data may be viewed as an out-of-distribution or irrelevant class. This leads to extreme class-imbalance, and our theory and methods focus on this core issue. We propose a new strategy for active learning called GALAXY (Graph-based Active Learning At the eXtrEme), which blends ideas from graph-based active learning and deep learning. GALAXY automatically and adaptively selects more class-balanced examples for labeling than most other methods for active learning. Our theory shows that GALAXY performs a refined form of uncertainty sampling that gathers a much more class-balanced dataset than vanilla uncertainty sampling. Experimentally, we demonstrate GALAXY’s superiority over existing state-of-art deep active learning algorithms in unbalanced vision classification settings generated from popular datasets.

Cite this Paper


BibTeX
@InProceedings{pmlr-v162-zhang22k, title = {{GALAXY}: Graph-based Active Learning at the Extreme}, author = {Zhang, Jifan and Katz-Samuels, Julian and Nowak, Robert}, booktitle = {Proceedings of the 39th International Conference on Machine Learning}, pages = {26223--26238}, year = {2022}, editor = {Chaudhuri, Kamalika and Jegelka, Stefanie and Song, Le and Szepesvari, Csaba and Niu, Gang and Sabato, Sivan}, volume = {162}, series = {Proceedings of Machine Learning Research}, month = {17--23 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v162/zhang22k/zhang22k.pdf}, url = {https://proceedings.mlr.press/v162/zhang22k.html}, abstract = {Active learning is a label-efficient approach to train highly effective models while interactively selecting only small subsets of unlabelled data for labelling and training. In “open world" settings, the classes of interest can make up a small fraction of the overall dataset – most of the data may be viewed as an out-of-distribution or irrelevant class. This leads to extreme class-imbalance, and our theory and methods focus on this core issue. We propose a new strategy for active learning called GALAXY (Graph-based Active Learning At the eXtrEme), which blends ideas from graph-based active learning and deep learning. GALAXY automatically and adaptively selects more class-balanced examples for labeling than most other methods for active learning. Our theory shows that GALAXY performs a refined form of uncertainty sampling that gathers a much more class-balanced dataset than vanilla uncertainty sampling. Experimentally, we demonstrate GALAXY’s superiority over existing state-of-art deep active learning algorithms in unbalanced vision classification settings generated from popular datasets.} }
Endnote
%0 Conference Paper %T GALAXY: Graph-based Active Learning at the Extreme %A Jifan Zhang %A Julian Katz-Samuels %A Robert Nowak %B Proceedings of the 39th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2022 %E Kamalika Chaudhuri %E Stefanie Jegelka %E Le Song %E Csaba Szepesvari %E Gang Niu %E Sivan Sabato %F pmlr-v162-zhang22k %I PMLR %P 26223--26238 %U https://proceedings.mlr.press/v162/zhang22k.html %V 162 %X Active learning is a label-efficient approach to train highly effective models while interactively selecting only small subsets of unlabelled data for labelling and training. In “open world" settings, the classes of interest can make up a small fraction of the overall dataset – most of the data may be viewed as an out-of-distribution or irrelevant class. This leads to extreme class-imbalance, and our theory and methods focus on this core issue. We propose a new strategy for active learning called GALAXY (Graph-based Active Learning At the eXtrEme), which blends ideas from graph-based active learning and deep learning. GALAXY automatically and adaptively selects more class-balanced examples for labeling than most other methods for active learning. Our theory shows that GALAXY performs a refined form of uncertainty sampling that gathers a much more class-balanced dataset than vanilla uncertainty sampling. Experimentally, we demonstrate GALAXY’s superiority over existing state-of-art deep active learning algorithms in unbalanced vision classification settings generated from popular datasets.
APA
Zhang, J., Katz-Samuels, J. & Nowak, R.. (2022). GALAXY: Graph-based Active Learning at the Extreme. Proceedings of the 39th International Conference on Machine Learning, in Proceedings of Machine Learning Research 162:26223-26238 Available from https://proceedings.mlr.press/v162/zhang22k.html.

Related Material