Complementary Sum Sampling for Likelihood Approximation in Large Scale Classification
Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, PMLR 54:1030-1038, 2017.
We consider training probabilistic classifiers in the case that the number of classes is too large to perform exact normalisation over all classes. We show that the source of high variance in standard sampling approximations is due to simply not including the correct class of the datapoint into the approximation. To account for this we explicitly sum over a subset of classes and sample the remaining. We show that this simple approach is competitive with recently introduced non likelihood-based approximations.