[edit]
Self-training Converts Weak Learners to Strong Learners in Mixture Models
Proceedings of The 25th International Conference on Artificial Intelligence and Statistics, PMLR 151:8003-8021, 2022.
Abstract
We consider a binary classification problem when the data comes from a mixture of two rotationally symmetric distributions satisfying concentration and anti-concentration properties enjoyed by log-concave distributions among others. We show that there exists a universal constant Cerr>0 such that if a pseudolabeler βpl can achieve classification error at most Cerr, then for any ε>0, an iterative self-training algorithm initialized at β0:=βpl using pseudolabels ˆy=sgn(⟨βt,\xb⟩) and using at most ˜O(d/ε2) unlabeled examples suffices to learn the Bayes-optimal classifier up to ε error, where d is the ambient dimension. That is, self-training converts weak learners to strong learners using only unlabeled examples. We additionally show that by running gradient descent on the logistic loss one can obtain a pseudolabeler βpl with classification error Cerr using only O(d) labeled examples (i.e., independent of ε). Together our results imply that mixture models can be learned to within ε of the Bayes-optimal accuracy using at most O(d) labeled examples and ˜O(d/ε2) unlabeled examples by way of a semi-supervised self-training algorithm.