On Combining Bags to Better Learn from Label Proportions
Proceedings of The 25th International Conference on Artificial Intelligence and Statistics, PMLR 151:5913-5927, 2022.
In the framework of learning from label proportions (LLP) the goal is to learn a good instance-level label predictor from the observed label proportions of bags of instances. Most of the LLP algorithms either explicitly or implicitly assume the nature of bag distributions with respect to the actual labels and instances, or cleverly adapt supervised learning techniques to suit LLP. In practical applications however, the scale and nature of data could render such assumptions invalid and the many of the algorithms impractical. In this paper we address the hard problem of solving LLP with provable error bounds while being bag distribution agnostic and model agnostic. We first propose the concept of generalized bags, an extension of bags and then devise an algorithm to combine bag distributions, if possible, into good generalized bag distributions. We show that (w.h.p) any classifier optimizing the squared Euclidean label-proportion loss on such a generalized bag distribution is guaranteed to minimize the instance-level loss as well. The predictive quality of our method is experimentally evaluated and it equals or betters the previous methods on pseudo-synthetic and real-world datasets.