Predicting with Distributions
Proceedings of the 2017 Conference on Learning Theory, PMLR 65:1214-1241, 2017.
We consider a new learning model in which a joint distribution over vector pairs $(x,y)$ is determined by an unknown function $c(x)$ that maps input vectors $x$ not to individual outputs, but to entire \em distributions\/ over output vectors $y$. Our main results take the form of rather general reductions from our model to algorithms for PAC learning the function class and the distribution class separately, and show that virtually every such combination yields an efficient algorithm in our model. Our methods include a randomized reduction to classification noise and an application of Le Cam’s method to obtain robust learning algorithms.