Multicalibrated Partitions for Importance Weights

Parikshit Gopalan, Omer Reingold, Vatsal Sharan, Udi Wieder
Proceedings of The 33rd International Conference on Algorithmic Learning Theory, PMLR 167:408-435, 2022.

Abstract

The ratio between the probability that two distributions assign to points in the domain are called importance weights or density ratios and they play a fundamental role in machine learning and information theory. However, there are strong lower bounds known for point-wise accurate estimation of density ratios, and most theoretical guarantees require strong assumptions about the distributions. We motivate the problem of seeking accuracy guarantees for the distribution of importance weights conditioned on sub-populations belonging to a family $\mathcal{C}$ of subsets of the domain. We formulate {\em sandwiching bounds} for sets: upper and lower bounds on the expected importance weight conditioned on a set; as a notion of set-wise accuracy for importance weights. We argue that they capture intuitive expectations about importance weights, and are not subject to the strong lower bounds for point-wise guarantees. We introduce the notion of multicalibrated partitions for a class $\mathcal{C}$, inspired by recent work on multi-calibration in supervised learning and show that the importance weights resulting from such partitions do satisfy sandwiching bounds. In contrast, we show that importance weights returned by popular algorithms in the literature may violate the sandwiching bounds. We present an efficient algorithm for constructing multi-calibrated partitions, given a weak agnostic learner for the class $\mathcal{C}$.

Cite this Paper


BibTeX
@InProceedings{pmlr-v167-gopalan22a, title = {Multicalibrated Partitions for Importance Weights}, author = {Gopalan, Parikshit and Reingold, Omer and Sharan, Vatsal and Wieder, Udi}, booktitle = {Proceedings of The 33rd International Conference on Algorithmic Learning Theory}, pages = {408--435}, year = {2022}, editor = {Dasgupta, Sanjoy and Haghtalab, Nika}, volume = {167}, series = {Proceedings of Machine Learning Research}, month = {29 Mar--01 Apr}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v167/gopalan22a/gopalan22a.pdf}, url = {https://proceedings.mlr.press/v167/gopalan22a.html}, abstract = {The ratio between the probability that two distributions assign to points in the domain are called importance weights or density ratios and they play a fundamental role in machine learning and information theory. However, there are strong lower bounds known for point-wise accurate estimation of density ratios, and most theoretical guarantees require strong assumptions about the distributions. We motivate the problem of seeking accuracy guarantees for the distribution of importance weights conditioned on sub-populations belonging to a family $\mathcal{C}$ of subsets of the domain. We formulate {\em sandwiching bounds} for sets: upper and lower bounds on the expected importance weight conditioned on a set; as a notion of set-wise accuracy for importance weights. We argue that they capture intuitive expectations about importance weights, and are not subject to the strong lower bounds for point-wise guarantees. We introduce the notion of multicalibrated partitions for a class $\mathcal{C}$, inspired by recent work on multi-calibration in supervised learning and show that the importance weights resulting from such partitions do satisfy sandwiching bounds. In contrast, we show that importance weights returned by popular algorithms in the literature may violate the sandwiching bounds. We present an efficient algorithm for constructing multi-calibrated partitions, given a weak agnostic learner for the class $\mathcal{C}$.} }
Endnote
%0 Conference Paper %T Multicalibrated Partitions for Importance Weights %A Parikshit Gopalan %A Omer Reingold %A Vatsal Sharan %A Udi Wieder %B Proceedings of The 33rd International Conference on Algorithmic Learning Theory %C Proceedings of Machine Learning Research %D 2022 %E Sanjoy Dasgupta %E Nika Haghtalab %F pmlr-v167-gopalan22a %I PMLR %P 408--435 %U https://proceedings.mlr.press/v167/gopalan22a.html %V 167 %X The ratio between the probability that two distributions assign to points in the domain are called importance weights or density ratios and they play a fundamental role in machine learning and information theory. However, there are strong lower bounds known for point-wise accurate estimation of density ratios, and most theoretical guarantees require strong assumptions about the distributions. We motivate the problem of seeking accuracy guarantees for the distribution of importance weights conditioned on sub-populations belonging to a family $\mathcal{C}$ of subsets of the domain. We formulate {\em sandwiching bounds} for sets: upper and lower bounds on the expected importance weight conditioned on a set; as a notion of set-wise accuracy for importance weights. We argue that they capture intuitive expectations about importance weights, and are not subject to the strong lower bounds for point-wise guarantees. We introduce the notion of multicalibrated partitions for a class $\mathcal{C}$, inspired by recent work on multi-calibration in supervised learning and show that the importance weights resulting from such partitions do satisfy sandwiching bounds. In contrast, we show that importance weights returned by popular algorithms in the literature may violate the sandwiching bounds. We present an efficient algorithm for constructing multi-calibrated partitions, given a weak agnostic learner for the class $\mathcal{C}$.
APA
Gopalan, P., Reingold, O., Sharan, V. & Wieder, U.. (2022). Multicalibrated Partitions for Importance Weights. Proceedings of The 33rd International Conference on Algorithmic Learning Theory, in Proceedings of Machine Learning Research 167:408-435 Available from https://proceedings.mlr.press/v167/gopalan22a.html.

Related Material