Multiply Robust Off-policy Evaluation and Learning under Truncation by Death

Jianing Chu, Shu Yang, Wenbin Lu
Proceedings of the 40th International Conference on Machine Learning, PMLR 202:6195-6227, 2023.

Abstract

Typical off-policy evaluation (OPE) and off-policy learning (OPL) are not well-defined problems under "truncation by death", where the outcome of interest is not defined after some events, such as death. The standard OPE no longer yields consistent estimators, and the standard OPL results in suboptimal policies. In this paper, we formulate OPE and OPL using principal stratification under "truncation by death". We propose a survivor value function for a subpopulation whose outcomes are always defined regardless of treatment conditions. We establish a novel identification strategy under principal ignorability, and derive the semiparametric efficiency bound of an OPE estimator. Then, we propose multiply robust estimators for OPE and OPL. We show that the proposed estimators are consistent and asymptotically normal even with flexible semi/nonparametric models for nuisance functions approximation. Moreover, under mild rate conditions of nuisance functions approximation, the estimators achieve the semiparametric efficiency bound. Finally, we conduct experiments to demonstrate the empirical performance of the proposed estimators.

Cite this Paper


BibTeX
@InProceedings{pmlr-v202-chu23c, title = {Multiply Robust Off-policy Evaluation and Learning under Truncation by Death}, author = {Chu, Jianing and Yang, Shu and Lu, Wenbin}, booktitle = {Proceedings of the 40th International Conference on Machine Learning}, pages = {6195--6227}, year = {2023}, editor = {Krause, Andreas and Brunskill, Emma and Cho, Kyunghyun and Engelhardt, Barbara and Sabato, Sivan and Scarlett, Jonathan}, volume = {202}, series = {Proceedings of Machine Learning Research}, month = {23--29 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v202/chu23c/chu23c.pdf}, url = {https://proceedings.mlr.press/v202/chu23c.html}, abstract = {Typical off-policy evaluation (OPE) and off-policy learning (OPL) are not well-defined problems under "truncation by death", where the outcome of interest is not defined after some events, such as death. The standard OPE no longer yields consistent estimators, and the standard OPL results in suboptimal policies. In this paper, we formulate OPE and OPL using principal stratification under "truncation by death". We propose a survivor value function for a subpopulation whose outcomes are always defined regardless of treatment conditions. We establish a novel identification strategy under principal ignorability, and derive the semiparametric efficiency bound of an OPE estimator. Then, we propose multiply robust estimators for OPE and OPL. We show that the proposed estimators are consistent and asymptotically normal even with flexible semi/nonparametric models for nuisance functions approximation. Moreover, under mild rate conditions of nuisance functions approximation, the estimators achieve the semiparametric efficiency bound. Finally, we conduct experiments to demonstrate the empirical performance of the proposed estimators.} }
Endnote
%0 Conference Paper %T Multiply Robust Off-policy Evaluation and Learning under Truncation by Death %A Jianing Chu %A Shu Yang %A Wenbin Lu %B Proceedings of the 40th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2023 %E Andreas Krause %E Emma Brunskill %E Kyunghyun Cho %E Barbara Engelhardt %E Sivan Sabato %E Jonathan Scarlett %F pmlr-v202-chu23c %I PMLR %P 6195--6227 %U https://proceedings.mlr.press/v202/chu23c.html %V 202 %X Typical off-policy evaluation (OPE) and off-policy learning (OPL) are not well-defined problems under "truncation by death", where the outcome of interest is not defined after some events, such as death. The standard OPE no longer yields consistent estimators, and the standard OPL results in suboptimal policies. In this paper, we formulate OPE and OPL using principal stratification under "truncation by death". We propose a survivor value function for a subpopulation whose outcomes are always defined regardless of treatment conditions. We establish a novel identification strategy under principal ignorability, and derive the semiparametric efficiency bound of an OPE estimator. Then, we propose multiply robust estimators for OPE and OPL. We show that the proposed estimators are consistent and asymptotically normal even with flexible semi/nonparametric models for nuisance functions approximation. Moreover, under mild rate conditions of nuisance functions approximation, the estimators achieve the semiparametric efficiency bound. Finally, we conduct experiments to demonstrate the empirical performance of the proposed estimators.
APA
Chu, J., Yang, S. & Lu, W.. (2023). Multiply Robust Off-policy Evaluation and Learning under Truncation by Death. Proceedings of the 40th International Conference on Machine Learning, in Proceedings of Machine Learning Research 202:6195-6227 Available from https://proceedings.mlr.press/v202/chu23c.html.

Related Material