Statistical Inference Under Constrained Selection Bias

Santiago Cortes-Gomez, Mateo Dulce Rubio, Carlos Miguel Patiño, Bryan Wilder
Proceedings of the 41st International Conference on Machine Learning, PMLR 235:9361-9379, 2024.

Abstract

Large-scale datasets are increasingly being used to inform decision making. While this effort aims to ground policy in real-world evidence, challenges have arisen as selection bias and other forms of distribution shifts often plague observational data. Previous attempts to provide robust inference have given guarantees depending on a user-specified amount of possible distribution shift (e.g., the maximum KL divergence between the observed and target distributions). However, decision makers will often have additional knowledge about the target distribution which constrains the kind of possible shifts. To leverage such information, we propose a framework that enables statistical inference in the presence of selection bias which obeys user-specified constraints in the form of functions whose expectation is known under the target distribution. The output is high-probability bounds on the value of an estimand for the target distribution. Hence, our method leverages domain knowledge in order to partially identify a wide class of estimands. We analyze the computational and statistical properties of methods to estimate these bounds and show that our method can produce informative bounds on a variety of simulated and semisynthetic tasks, as well as in a real-world use case.

Cite this Paper


BibTeX
@InProceedings{pmlr-v235-cortes-gomez24a, title = {Statistical Inference Under Constrained Selection Bias}, author = {Cortes-Gomez, Santiago and Dulce Rubio, Mateo and Pati\~{n}o, Carlos Miguel and Wilder, Bryan}, booktitle = {Proceedings of the 41st International Conference on Machine Learning}, pages = {9361--9379}, year = {2024}, editor = {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix}, volume = {235}, series = {Proceedings of Machine Learning Research}, month = {21--27 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v235/main/assets/cortes-gomez24a/cortes-gomez24a.pdf}, url = {https://proceedings.mlr.press/v235/cortes-gomez24a.html}, abstract = {Large-scale datasets are increasingly being used to inform decision making. While this effort aims to ground policy in real-world evidence, challenges have arisen as selection bias and other forms of distribution shifts often plague observational data. Previous attempts to provide robust inference have given guarantees depending on a user-specified amount of possible distribution shift (e.g., the maximum KL divergence between the observed and target distributions). However, decision makers will often have additional knowledge about the target distribution which constrains the kind of possible shifts. To leverage such information, we propose a framework that enables statistical inference in the presence of selection bias which obeys user-specified constraints in the form of functions whose expectation is known under the target distribution. The output is high-probability bounds on the value of an estimand for the target distribution. Hence, our method leverages domain knowledge in order to partially identify a wide class of estimands. We analyze the computational and statistical properties of methods to estimate these bounds and show that our method can produce informative bounds on a variety of simulated and semisynthetic tasks, as well as in a real-world use case.} }
Endnote
%0 Conference Paper %T Statistical Inference Under Constrained Selection Bias %A Santiago Cortes-Gomez %A Mateo Dulce Rubio %A Carlos Miguel Patiño %A Bryan Wilder %B Proceedings of the 41st International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2024 %E Ruslan Salakhutdinov %E Zico Kolter %E Katherine Heller %E Adrian Weller %E Nuria Oliver %E Jonathan Scarlett %E Felix Berkenkamp %F pmlr-v235-cortes-gomez24a %I PMLR %P 9361--9379 %U https://proceedings.mlr.press/v235/cortes-gomez24a.html %V 235 %X Large-scale datasets are increasingly being used to inform decision making. While this effort aims to ground policy in real-world evidence, challenges have arisen as selection bias and other forms of distribution shifts often plague observational data. Previous attempts to provide robust inference have given guarantees depending on a user-specified amount of possible distribution shift (e.g., the maximum KL divergence between the observed and target distributions). However, decision makers will often have additional knowledge about the target distribution which constrains the kind of possible shifts. To leverage such information, we propose a framework that enables statistical inference in the presence of selection bias which obeys user-specified constraints in the form of functions whose expectation is known under the target distribution. The output is high-probability bounds on the value of an estimand for the target distribution. Hence, our method leverages domain knowledge in order to partially identify a wide class of estimands. We analyze the computational and statistical properties of methods to estimate these bounds and show that our method can produce informative bounds on a variety of simulated and semisynthetic tasks, as well as in a real-world use case.
APA
Cortes-Gomez, S., Dulce Rubio, M., Patiño, C.M. & Wilder, B.. (2024). Statistical Inference Under Constrained Selection Bias. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:9361-9379 Available from https://proceedings.mlr.press/v235/cortes-gomez24a.html.

Related Material