Doubly Aggressive Selective Sampling Algorithms for Classification

[edit]

Koby Crammer ;
Proceedings of the Seventeenth International Conference on Artificial Intelligence and Statistics, PMLR 33:140-148, 2014.

Abstract

Online selective sampling algorithms learn to perform binary classification, and additionally they decided whether to ask, or query, for a label of any given example. We introduce two stochastic linear algorithms and analyze them in the worst-case mistake-bound framework. Even though stochastic, for some inputs, our algorithms query with probability 1 and make an update even if there is no mistake, yet the margin is small, hence they are doubly aggressive. We prove bounds in the worst-case settings, which may be lower than previous bounds in some settings. Experiments with 33 document classification datasets, some with 100Ks examples, show the superiority of doubly-aggressive algorithms both in performance and number of queries.

Related Material