Active Tolerant Testing

Avrim Blum; Lunjia Hu

Active Tolerant Testing

Avrim Blum, Lunjia Hu

Proceedings of the 31st Conference On Learning Theory, PMLR 75:474-497, 2018.

Abstract

In this work, we show that for a nontrivial hypothesis class $\mathcal C$, we can estimate the distance of a target function $f$ to $\mathcal C$ (estimate the error rate of the best $h\in \mathcal C$) using substantially fewer labeled examples than would be needed to actually {\em learn} a good $h \in \mathcal C$. Specifically, we show that for the class $\mathcal C$ of unions of $d$ intervals on the line, in the active learning setting in which we have access to a pool of unlabeled examples drawn from an arbitrary underlying distribution $\mathcal D$, we can estimate the error rate of the best $h \in \mathcal C$ to an additive error $\epsilon$ with a number of label requests that is {\em independent of $d$} and depends only on $\epsilon$. In particular, we make $O(\frac{1}{\epsilon^6}\log \frac{1}{\epsilon})$ label queries to an unlabeled pool of size $O(\frac{d}{\epsilon^2}\log \frac{1}{\epsilon})$. This task of estimating the distance of an unknown $f$ to a given class $\mathcal C$ is called {\em tolerant testing} or {\em distance estimation} in the testing literature, usually studied in a membership query model and with respect to the uniform distribution. Our work extends that of Balcan et al. (2012) who solved the {\em non}-tolerant testing problem for this class (distinguishing the zero-error case from the case that the best hypothesis in the class has error greater than $\epsilon$). We also consider the related problem of estimating the performance of a given learning algorithm $\mathcal A$ in this setting. That is, given a large pool of unlabeled examples drawn from distribution $\mathcal D$, can we, from only a few label queries, estimate how well $\mathcal A$ would perform if the entire dataset were labeled and given as training data to $\mathcal A$? We focus on $k$-Nearest Neighbor style algorithms, and also show how our results can be applied to the problem of hyperparameter tuning (selecting the best value of $k$ for the given learning problem).

Cite this Paper

BibTeX


@InProceedings{pmlr-v75-blum18a,
  title = 	 {Active Tolerant Testing},
  author =       {Blum, Avrim and Hu, Lunjia},
  booktitle = 	 {Proceedings of the 31st  Conference On Learning Theory},
  pages = 	 {474--497},
  year = 	 {2018},
  editor = 	 {Bubeck, Sébastien and Perchet, Vianney and Rigollet, Philippe},
  volume = 	 {75},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {06--09 Jul},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v75/blum18a/blum18a.pdf},
  url = 	 {https://proceedings.mlr.press/v75/blum18a.html},
  abstract = 	 {In this work, we show that for a nontrivial hypothesis class $\mathcal C$, we can estimate the distance of a target function $f$ to $\mathcal C$ (estimate the error rate of the best $h\in \mathcal C$) using substantially fewer labeled examples than would be needed to actually {\em learn} a good $h \in \mathcal C$.   Specifically, we show that for the class $\mathcal C$ of unions of $d$ intervals on the line, in the active learning setting in which we have access to a pool of unlabeled examples drawn from an arbitrary underlying distribution $\mathcal D$, we can estimate the error rate of the best $h \in \mathcal C$ to an additive error $\epsilon$ with a number of label requests that is {\em independent of $d$} and depends only on $\epsilon$.  In particular, we make $O(\frac{1}{\epsilon^6}\log \frac{1}{\epsilon})$ label queries to an unlabeled pool of size $O(\frac{d}{\epsilon^2}\log \frac{1}{\epsilon})$.  This task of estimating the distance of an unknown $f$ to a given class $\mathcal C$  is called {\em tolerant testing} or {\em distance estimation} in the testing literature, usually studied in a membership query model and with respect to the uniform distribution.  Our work extends that of Balcan et al. (2012) who solved the {\em non}-tolerant testing problem for this class (distinguishing the zero-error case from the case that the best hypothesis in the class has error greater than $\epsilon$).   We also consider the related problem of estimating the performance of a given learning algorithm $\mathcal A$ in this setting.  That is, given a large pool of unlabeled examples drawn from distribution $\mathcal D$, can we, from only a few label queries, estimate how well $\mathcal A$ would perform if the entire dataset were labeled and given as training data to $\mathcal A$?   We focus on $k$-Nearest Neighbor style algorithms, and also show how our results can be applied to the problem of hyperparameter tuning (selecting the best value of $k$ for the given learning problem).}
}

Endnote

%0 Conference Paper
%T Active Tolerant Testing
%A Avrim Blum
%A Lunjia Hu
%B Proceedings of the 31st  Conference On Learning Theory
%C Proceedings of Machine Learning Research
%D 2018
%E Sébastien Bubeck
%E Vianney Perchet
%E Philippe Rigollet	
%F pmlr-v75-blum18a
%I PMLR
%P 474--497
%U https://proceedings.mlr.press/v75/blum18a.html
%V 75
%X In this work, we show that for a nontrivial hypothesis class $\mathcal C$, we can estimate the distance of a target function $f$ to $\mathcal C$ (estimate the error rate of the best $h\in \mathcal C$) using substantially fewer labeled examples than would be needed to actually {\em learn} a good $h \in \mathcal C$.   Specifically, we show that for the class $\mathcal C$ of unions of $d$ intervals on the line, in the active learning setting in which we have access to a pool of unlabeled examples drawn from an arbitrary underlying distribution $\mathcal D$, we can estimate the error rate of the best $h \in \mathcal C$ to an additive error $\epsilon$ with a number of label requests that is {\em independent of $d$} and depends only on $\epsilon$.  In particular, we make $O(\frac{1}{\epsilon^6}\log \frac{1}{\epsilon})$ label queries to an unlabeled pool of size $O(\frac{d}{\epsilon^2}\log \frac{1}{\epsilon})$.  This task of estimating the distance of an unknown $f$ to a given class $\mathcal C$  is called {\em tolerant testing} or {\em distance estimation} in the testing literature, usually studied in a membership query model and with respect to the uniform distribution.  Our work extends that of Balcan et al. (2012) who solved the {\em non}-tolerant testing problem for this class (distinguishing the zero-error case from the case that the best hypothesis in the class has error greater than $\epsilon$).   We also consider the related problem of estimating the performance of a given learning algorithm $\mathcal A$ in this setting.  That is, given a large pool of unlabeled examples drawn from distribution $\mathcal D$, can we, from only a few label queries, estimate how well $\mathcal A$ would perform if the entire dataset were labeled and given as training data to $\mathcal A$?   We focus on $k$-Nearest Neighbor style algorithms, and also show how our results can be applied to the problem of hyperparameter tuning (selecting the best value of $k$ for the given learning problem).

APA


Blum, A. & Hu, L.. (2018). Active Tolerant Testing. Proceedings of the 31st  Conference On Learning Theory, in Proceedings of Machine Learning Research 75:474-497 Available from https://proceedings.mlr.press/v75/blum18a.html.

Related Material

Download PDF