Off-Policy Evaluation under Nonignorable Missing Data

Han Wang, Yang Xu, Wenbin Lu, Rui Song
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:65020-65058, 2025.

Abstract

Off-Policy Evaluation (OPE) aims to estimate the value of a target policy using offline data collected from potentially different policies. In real-world applications, however, logged data often suffers from missingness. While OPE has been extensively studied in the literature, a theoretical understanding of how missing data affects OPE results remains unclear. In this paper, we investigate OPE in the presence of monotone missingness and theoretically demonstrate that the value estimates remain unbiased under ignorable missingness but can be biased under nonignorable (informative) missingness. To retain the consistency of value estimation, we propose an inverse probability weighting value estimator and conduct statistical inference to quantify the uncertainty of the estimates. Through a series of numerical experiments, we empirically demonstrate that our proposed estimator yields a more reliable value inference under missing data.

Cite this Paper


BibTeX
@InProceedings{pmlr-v267-wang25dt, title = {Off-Policy Evaluation under Nonignorable Missing Data}, author = {Wang, Han and Xu, Yang and Lu, Wenbin and Song, Rui}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, pages = {65020--65058}, year = {2025}, editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry}, volume = {267}, series = {Proceedings of Machine Learning Research}, month = {13--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/wang25dt/wang25dt.pdf}, url = {https://proceedings.mlr.press/v267/wang25dt.html}, abstract = {Off-Policy Evaluation (OPE) aims to estimate the value of a target policy using offline data collected from potentially different policies. In real-world applications, however, logged data often suffers from missingness. While OPE has been extensively studied in the literature, a theoretical understanding of how missing data affects OPE results remains unclear. In this paper, we investigate OPE in the presence of monotone missingness and theoretically demonstrate that the value estimates remain unbiased under ignorable missingness but can be biased under nonignorable (informative) missingness. To retain the consistency of value estimation, we propose an inverse probability weighting value estimator and conduct statistical inference to quantify the uncertainty of the estimates. Through a series of numerical experiments, we empirically demonstrate that our proposed estimator yields a more reliable value inference under missing data.} }
Endnote
%0 Conference Paper %T Off-Policy Evaluation under Nonignorable Missing Data %A Han Wang %A Yang Xu %A Wenbin Lu %A Rui Song %B Proceedings of the 42nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Aarti Singh %E Maryam Fazel %E Daniel Hsu %E Simon Lacoste-Julien %E Felix Berkenkamp %E Tegan Maharaj %E Kiri Wagstaff %E Jerry Zhu %F pmlr-v267-wang25dt %I PMLR %P 65020--65058 %U https://proceedings.mlr.press/v267/wang25dt.html %V 267 %X Off-Policy Evaluation (OPE) aims to estimate the value of a target policy using offline data collected from potentially different policies. In real-world applications, however, logged data often suffers from missingness. While OPE has been extensively studied in the literature, a theoretical understanding of how missing data affects OPE results remains unclear. In this paper, we investigate OPE in the presence of monotone missingness and theoretically demonstrate that the value estimates remain unbiased under ignorable missingness but can be biased under nonignorable (informative) missingness. To retain the consistency of value estimation, we propose an inverse probability weighting value estimator and conduct statistical inference to quantify the uncertainty of the estimates. Through a series of numerical experiments, we empirically demonstrate that our proposed estimator yields a more reliable value inference under missing data.
APA
Wang, H., Xu, Y., Lu, W. & Song, R.. (2025). Off-Policy Evaluation under Nonignorable Missing Data. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:65020-65058 Available from https://proceedings.mlr.press/v267/wang25dt.html.

Related Material