Active Tolerant Testing

[edit]

Avrim Blum, Lunjia Hu ;
Proceedings of the 31st Conference On Learning Theory, PMLR 75:474-497, 2018.

Abstract

In this work, we show that for a nontrivial hypothesis class $\mathcal C$, we can estimate the distance of a target function $f$ to $\mathcal C$ (estimate the error rate of the best $h\in \mathcal C$) using substantially fewer labeled examples than would be needed to actually {\em learn} a good $h \in \mathcal C$. Specifically, we show that for the class $\mathcal C$ of unions of $d$ intervals on the line, in the active learning setting in which we have access to a pool of unlabeled examples drawn from an arbitrary underlying distribution $\mathcal D$, we can estimate the error rate of the best $h \in \mathcal C$ to an additive error $\epsilon$ with a number of label requests that is {\em independent of $d$} and depends only on $\epsilon$. In particular, we make $O(\frac{1}{\epsilon^6}\log \frac{1}{\epsilon})$ label queries to an unlabeled pool of size $O(\frac{d}{\epsilon^2}\log \frac{1}{\epsilon})$. This task of estimating the distance of an unknown $f$ to a given class $\mathcal C$ is called {\em tolerant testing} or {\em distance estimation} in the testing literature, usually studied in a membership query model and with respect to the uniform distribution. Our work extends that of Balcan et al. (2012) who solved the {\em non}-tolerant testing problem for this class (distinguishing the zero-error case from the case that the best hypothesis in the class has error greater than $\epsilon$). We also consider the related problem of estimating the performance of a given learning algorithm $\mathcal A$ in this setting. That is, given a large pool of unlabeled examples drawn from distribution $\mathcal D$, can we, from only a few label queries, estimate how well $\mathcal A$ would perform if the entire dataset were labeled and given as training data to $\mathcal A$? We focus on $k$-Nearest Neighbor style algorithms, and also show how our results can be applied to the problem of hyperparameter tuning (selecting the best value of $k$ for the given learning problem).

Related Material