Learning in Hybrid Noise Environments Using Statistical Queries
Pre-proceedings of the Fifth International Workshop on Artificial Intelligence and Statistics, PMLR R0:175-185, 1995.
We consider theoretical models of learning from noisy data. Specifically, we focus on learning in the probability approximately correct model as defined by Valiant. Two of the most widely studied models of noise in this setting have been classification noise and malicious errors. However, a more realistic model combining the two types of noise has not been formalized. We define a learning environment based on a natural combination of these two noise models. We first show that hypothesis testing is possible in this model. We next describe a simple technique for learning in this model, and then describe a more powerful technique based on statistical query learning. We show that the noise tolerance of this improved technique is roughly optimal with respect to the tolerance of the statistical query algorithm and that it provides a smooth tradeoff between the tolerable amounts of the two types of noise. Finally, we show that statistical query simulation yields learning algorithms for other combinations of noise models, thus demonstrating that statistical query specification truly captures the generic fault tolerance of a learning algorithm.