Active Tolerant Testing

Avrim Blum, Lunjia Hu
Proceedings of the 31st Conference On Learning Theory, PMLR 75:474-497, 2018.

Abstract

In this work, we show that for a nontrivial hypothesis class $\mathcal C$, we can estimate the distance of a target function $f$ to $\mathcal C$ (estimate the error rate of the best $h\in \mathcal C$) using substantially fewer labeled examples than would be needed to actually {\em learn} a good $h \in \mathcal C$. Specifically, we show that for the class $\mathcal C$ of unions of $d$ intervals on the line, in the active learning setting in which we have access to a pool of unlabeled examples drawn from an arbitrary underlying distribution $\mathcal D$, we can estimate the error rate of the best $h \in \mathcal C$ to an additive error $\epsilon$ with a number of label requests that is {\em independent of $d$} and depends only on $\epsilon$. In particular, we make $O(\frac{1}{\epsilon^6}\log \frac{1}{\epsilon})$ label queries to an unlabeled pool of size $O(\frac{d}{\epsilon^2}\log \frac{1}{\epsilon})$. This task of estimating the distance of an unknown $f$ to a given class $\mathcal C$ is called {\em tolerant testing} or {\em distance estimation} in the testing literature, usually studied in a membership query model and with respect to the uniform distribution. Our work extends that of Balcan et al. (2012) who solved the {\em non}-tolerant testing problem for this class (distinguishing the zero-error case from the case that the best hypothesis in the class has error greater than $\epsilon$). We also consider the related problem of estimating the performance of a given learning algorithm $\mathcal A$ in this setting. That is, given a large pool of unlabeled examples drawn from distribution $\mathcal D$, can we, from only a few label queries, estimate how well $\mathcal A$ would perform if the entire dataset were labeled and given as training data to $\mathcal A$? We focus on $k$-Nearest Neighbor style algorithms, and also show how our results can be applied to the problem of hyperparameter tuning (selecting the best value of $k$ for the given learning problem).

Cite this Paper


BibTeX
@InProceedings{pmlr-v75-blum18a, title = {Active Tolerant Testing}, author = {Blum, Avrim and Hu, Lunjia}, booktitle = {Proceedings of the 31st Conference On Learning Theory}, pages = {474--497}, year = {2018}, editor = {Bubeck, Sébastien and Perchet, Vianney and Rigollet, Philippe}, volume = {75}, series = {Proceedings of Machine Learning Research}, month = {06--09 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v75/blum18a/blum18a.pdf}, url = {https://proceedings.mlr.press/v75/blum18a.html}, abstract = {In this work, we show that for a nontrivial hypothesis class $\mathcal C$, we can estimate the distance of a target function $f$ to $\mathcal C$ (estimate the error rate of the best $h\in \mathcal C$) using substantially fewer labeled examples than would be needed to actually {\em learn} a good $h \in \mathcal C$. Specifically, we show that for the class $\mathcal C$ of unions of $d$ intervals on the line, in the active learning setting in which we have access to a pool of unlabeled examples drawn from an arbitrary underlying distribution $\mathcal D$, we can estimate the error rate of the best $h \in \mathcal C$ to an additive error $\epsilon$ with a number of label requests that is {\em independent of $d$} and depends only on $\epsilon$. In particular, we make $O(\frac{1}{\epsilon^6}\log \frac{1}{\epsilon})$ label queries to an unlabeled pool of size $O(\frac{d}{\epsilon^2}\log \frac{1}{\epsilon})$. This task of estimating the distance of an unknown $f$ to a given class $\mathcal C$ is called {\em tolerant testing} or {\em distance estimation} in the testing literature, usually studied in a membership query model and with respect to the uniform distribution. Our work extends that of Balcan et al. (2012) who solved the {\em non}-tolerant testing problem for this class (distinguishing the zero-error case from the case that the best hypothesis in the class has error greater than $\epsilon$). We also consider the related problem of estimating the performance of a given learning algorithm $\mathcal A$ in this setting. That is, given a large pool of unlabeled examples drawn from distribution $\mathcal D$, can we, from only a few label queries, estimate how well $\mathcal A$ would perform if the entire dataset were labeled and given as training data to $\mathcal A$? We focus on $k$-Nearest Neighbor style algorithms, and also show how our results can be applied to the problem of hyperparameter tuning (selecting the best value of $k$ for the given learning problem).} }
Endnote
%0 Conference Paper %T Active Tolerant Testing %A Avrim Blum %A Lunjia Hu %B Proceedings of the 31st Conference On Learning Theory %C Proceedings of Machine Learning Research %D 2018 %E Sébastien Bubeck %E Vianney Perchet %E Philippe Rigollet %F pmlr-v75-blum18a %I PMLR %P 474--497 %U https://proceedings.mlr.press/v75/blum18a.html %V 75 %X In this work, we show that for a nontrivial hypothesis class $\mathcal C$, we can estimate the distance of a target function $f$ to $\mathcal C$ (estimate the error rate of the best $h\in \mathcal C$) using substantially fewer labeled examples than would be needed to actually {\em learn} a good $h \in \mathcal C$. Specifically, we show that for the class $\mathcal C$ of unions of $d$ intervals on the line, in the active learning setting in which we have access to a pool of unlabeled examples drawn from an arbitrary underlying distribution $\mathcal D$, we can estimate the error rate of the best $h \in \mathcal C$ to an additive error $\epsilon$ with a number of label requests that is {\em independent of $d$} and depends only on $\epsilon$. In particular, we make $O(\frac{1}{\epsilon^6}\log \frac{1}{\epsilon})$ label queries to an unlabeled pool of size $O(\frac{d}{\epsilon^2}\log \frac{1}{\epsilon})$. This task of estimating the distance of an unknown $f$ to a given class $\mathcal C$ is called {\em tolerant testing} or {\em distance estimation} in the testing literature, usually studied in a membership query model and with respect to the uniform distribution. Our work extends that of Balcan et al. (2012) who solved the {\em non}-tolerant testing problem for this class (distinguishing the zero-error case from the case that the best hypothesis in the class has error greater than $\epsilon$). We also consider the related problem of estimating the performance of a given learning algorithm $\mathcal A$ in this setting. That is, given a large pool of unlabeled examples drawn from distribution $\mathcal D$, can we, from only a few label queries, estimate how well $\mathcal A$ would perform if the entire dataset were labeled and given as training data to $\mathcal A$? We focus on $k$-Nearest Neighbor style algorithms, and also show how our results can be applied to the problem of hyperparameter tuning (selecting the best value of $k$ for the given learning problem).
APA
Blum, A. & Hu, L.. (2018). Active Tolerant Testing. Proceedings of the 31st Conference On Learning Theory, in Proceedings of Machine Learning Research 75:474-497 Available from https://proceedings.mlr.press/v75/blum18a.html.

Related Material