On Initial Pools for Deep Active Learning

Akshay L. Chandra, Sai Vikas Desai, Chaitanya Devaguptapu, Vineeth N. Balasubramanian
NeurIPS 2020 Workshop on Pre-registration in Machine Learning, PMLR 148:14-32, 2021.

Abstract

Active Learning (AL) techniques aim to minimize the training data required to train a model for a given task. Pool-based AL techniques start with a small initial labeled pool and then iteratively pick batches of the most informative samples for labeling. Generally, the initial pool is sampled randomly and labeled to seed the AL iterations. While recent studies have focused on evaluating the robustness of various query functions in AL, little to no attention has been given to the design of the initial labeled pool for deep active learning. Given the recent successes of learning representations in self-supervised/unsupervised ways, we study if an intelligently sampled initial labeled pool can improve deep AL performance. We investigate the effect of intelligently sampled initial labeled pools, including the use of self-supervised and unsupervised strategies, on deep AL methods. The setup, hypotheses, methodology, and implementation details were evaluated by peer review before experiments were conducted. Experimental results could not conclusively prove that intelligently sampled initial pools are better for AL than random initial pools in the long run, although a Variational Autoencoder-based initial pool sampling strategy showed interesting trends that merit deeper investigation.

Cite this Paper


BibTeX
@InProceedings{pmlr-v148-chandra21a, title = {On Initial Pools for Deep Active Learning}, author = {Chandra, Akshay L. and Desai, Sai Vikas and Devaguptapu, Chaitanya and Balasubramanian, Vineeth N.}, booktitle = {NeurIPS 2020 Workshop on Pre-registration in Machine Learning}, pages = {14--32}, year = {2021}, editor = {Bertinetto, Luca and Henriques, João F. and Albanie, Samuel and Paganini, Michela and Varol, Gül}, volume = {148}, series = {Proceedings of Machine Learning Research}, month = {11 Dec}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v148/chandra21a/chandra21a.pdf}, url = {http://proceedings.mlr.press/v148/chandra21a.html}, abstract = {Active Learning (AL) techniques aim to minimize the training data required to train a model for a given task. Pool-based AL techniques start with a small initial labeled pool and then iteratively pick batches of the most informative samples for labeling. Generally, the initial pool is sampled randomly and labeled to seed the AL iterations. While recent studies have focused on evaluating the robustness of various query functions in AL, little to no attention has been given to the design of the initial labeled pool for deep active learning. Given the recent successes of learning representations in self-supervised/unsupervised ways, we study if an intelligently sampled initial labeled pool can improve deep AL performance. We investigate the effect of intelligently sampled initial labeled pools, including the use of self-supervised and unsupervised strategies, on deep AL methods. The setup, hypotheses, methodology, and implementation details were evaluated by peer review before experiments were conducted. Experimental results could not conclusively prove that intelligently sampled initial pools are better for AL than random initial pools in the long run, although a Variational Autoencoder-based initial pool sampling strategy showed interesting trends that merit deeper investigation.} }
Endnote
%0 Conference Paper %T On Initial Pools for Deep Active Learning %A Akshay L. Chandra %A Sai Vikas Desai %A Chaitanya Devaguptapu %A Vineeth N. Balasubramanian %B NeurIPS 2020 Workshop on Pre-registration in Machine Learning %C Proceedings of Machine Learning Research %D 2021 %E Luca Bertinetto %E João F. Henriques %E Samuel Albanie %E Michela Paganini %E Gül Varol %F pmlr-v148-chandra21a %I PMLR %P 14--32 %U http://proceedings.mlr.press/v148/chandra21a.html %V 148 %X Active Learning (AL) techniques aim to minimize the training data required to train a model for a given task. Pool-based AL techniques start with a small initial labeled pool and then iteratively pick batches of the most informative samples for labeling. Generally, the initial pool is sampled randomly and labeled to seed the AL iterations. While recent studies have focused on evaluating the robustness of various query functions in AL, little to no attention has been given to the design of the initial labeled pool for deep active learning. Given the recent successes of learning representations in self-supervised/unsupervised ways, we study if an intelligently sampled initial labeled pool can improve deep AL performance. We investigate the effect of intelligently sampled initial labeled pools, including the use of self-supervised and unsupervised strategies, on deep AL methods. The setup, hypotheses, methodology, and implementation details were evaluated by peer review before experiments were conducted. Experimental results could not conclusively prove that intelligently sampled initial pools are better for AL than random initial pools in the long run, although a Variational Autoencoder-based initial pool sampling strategy showed interesting trends that merit deeper investigation.
APA
Chandra, A.L., Desai, S.V., Devaguptapu, C. & Balasubramanian, V.N.. (2021). On Initial Pools for Deep Active Learning. NeurIPS 2020 Workshop on Pre-registration in Machine Learning, in Proceedings of Machine Learning Research 148:14-32 Available from http://proceedings.mlr.press/v148/chandra21a.html.

Related Material