Constants Matter: The Performance Gains of Active Learning

Stephen O Mussmann; Sanjoy Dasgupta

Constants Matter: The Performance Gains of Active Learning

Stephen O Mussmann, Sanjoy Dasgupta

Proceedings of the 39th International Conference on Machine Learning, PMLR 162:16123-16173, 2022.

Abstract

Within machine learning, active learning studies the gains in performance made possible by adaptively selecting data points to label. In this work, we show through upper and lower bounds, that for a simple benign setting of well-specified logistic regression on a uniform distribution over a sphere, the expected excess error of both active learning and random sampling have the same inverse proportional dependence on the number of samples. Importantly, due to the nature of lower bounds, any more general setting does not allow a better dependence on the number of samples. Additionally, we show a variant of uncertainty sampling can achieve a faster rate of convergence than random sampling by a factor of the Bayes error, a recent empirical observation made by other work. Qualitatively, this work is pessimistic with respect to the asymptotic dependence on the number of samples, but optimistic with respect to finding performance gains in the constants.

Cite this Paper

BibTeX


@InProceedings{pmlr-v162-mussmann22a,
  title = 	 {Constants Matter: The Performance Gains of Active Learning},
  author =       {Mussmann, Stephen O and Dasgupta, Sanjoy},
  booktitle = 	 {Proceedings of the 39th International Conference on Machine Learning},
  pages = 	 {16123--16173},
  year = 	 {2022},
  editor = 	 {Chaudhuri, Kamalika and Jegelka, Stefanie and Song, Le and Szepesvari, Csaba and Niu, Gang and Sabato, Sivan},
  volume = 	 {162},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {17--23 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v162/mussmann22a/mussmann22a.pdf},
  url = 	 {https://proceedings.mlr.press/v162/mussmann22a.html},
  abstract = 	 {Within machine learning, active learning studies the gains in performance made possible by adaptively selecting data points to label. In this work, we show through upper and lower bounds, that for a simple benign setting of well-specified logistic regression on a uniform distribution over a sphere, the expected excess error of both active learning and random sampling have the same inverse proportional dependence on the number of samples. Importantly, due to the nature of lower bounds, any more general setting does not allow a better dependence on the number of samples. Additionally, we show a variant of uncertainty sampling can achieve a faster rate of convergence than random sampling by a factor of the Bayes error, a recent empirical observation made by other work. Qualitatively, this work is pessimistic with respect to the asymptotic dependence on the number of samples, but optimistic with respect to finding performance gains in the constants.}
}

Endnote

%0 Conference Paper
%T Constants Matter: The Performance Gains of Active Learning
%A Stephen O Mussmann
%A Sanjoy Dasgupta
%B Proceedings of the 39th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2022
%E Kamalika Chaudhuri
%E Stefanie Jegelka
%E Le Song
%E Csaba Szepesvari
%E Gang Niu
%E Sivan Sabato	
%F pmlr-v162-mussmann22a
%I PMLR
%P 16123--16173
%U https://proceedings.mlr.press/v162/mussmann22a.html
%V 162
%X Within machine learning, active learning studies the gains in performance made possible by adaptively selecting data points to label. In this work, we show through upper and lower bounds, that for a simple benign setting of well-specified logistic regression on a uniform distribution over a sphere, the expected excess error of both active learning and random sampling have the same inverse proportional dependence on the number of samples. Importantly, due to the nature of lower bounds, any more general setting does not allow a better dependence on the number of samples. Additionally, we show a variant of uncertainty sampling can achieve a faster rate of convergence than random sampling by a factor of the Bayes error, a recent empirical observation made by other work. Qualitatively, this work is pessimistic with respect to the asymptotic dependence on the number of samples, but optimistic with respect to finding performance gains in the constants.

APA


Mussmann, S.O. & Dasgupta, S.. (2022). Constants Matter: The Performance Gains of Active Learning. Proceedings of the 39th International Conference on Machine Learning, in Proceedings of Machine Learning Research 162:16123-16173 Available from https://proceedings.mlr.press/v162/mussmann22a.html.

Related Material

Download PDF