Distribution-independent Reliable Learning

[edit]

Varun Kanade, Justin Thaler ;
Proceedings of The 27th Conference on Learning Theory, PMLR 35:3-24, 2014.

Abstract

We study several questions in the \emphreliable agnostic learning framework of Kalai et al. (2009), which captures learning tasks in which one type of error is costlier than other types. A positive reliable classifier is one that makes no false positive errors. The goal in the \emphpositive reliable agnostic framework is to output a hypothesis with the following properties: (i) its false positive error rate is at most ε, (ii) its false negative error rate is at most εmore than that of the best positive reliable classifier from the class. A closely related notion is \emphfully reliable agnostic learning, which considers \emphpartial classifiers that are allowed to predict “unknown” on some inputs. The best fully reliable partial classifier is one that makes no errors and minimizes the probability of predicting “unknown”, and the goal in fully reliable learning is to output a hypothesis that is almost as good as the best fully reliable partial classifier from a class. For distribution-independent learning, the best known algorithms for PAC learning typically utilize polynomial threshold representations, while the state of the art agnostic learning algorithms use point-wise polynomial approximations. We show that \emphone-sided polynomial approximations, an intermediate notion between polynomial threshold representations and point-wise polynomial approximations, suffice for learning in the reliable agnostic settings. We then show that majorities can be fully reliably learned and disjunctions of majorities can be positive reliably learned, through constructions of appropriate one-sided polynomial approximations. Our fully reliable algorithm for majorities provides the first evidence that fully reliable learning may be strictly easier than agnostic learning. Our algorithms also satisfy strong attribute-efficiency properties, and in many cases they provide smooth tradeoffs between sample complexity and running time.

Related Material