Semiparametrically Efficient Off-Policy Evaluation in Linear Markov Decision Processes

Chuhan Xie; Wenhao Yang; Zhihua Zhang

Semiparametrically Efficient Off-Policy Evaluation in Linear Markov Decision Processes

Chuhan Xie, Wenhao Yang, Zhihua Zhang

Proceedings of the 40th International Conference on Machine Learning, PMLR 202:38227-38257, 2023.

Abstract

We study semiparametrically efficient estimation in off-policy evaluation (OPE) where the underlying Markov decision process (MDP) is linear with a known feature map. We characterize the variance lower bound for regular estimators in the linear MDP setting and propose an efficient estimator whose variance achieves that lower bound. Consistency and asymptotic normality of our estimator are established under mild conditions, which merely requires the only infinite-dimensional nuisance parameter to be estimated at a $n^{-1/4}$ convergence rate. We also construct an asymptotically valid confidence interval for statistical inference and conduct simulation studies to validate our results. To our knowledge, this is the first work that concerns efficient estimation in the presence of a known structure of MDPs in the OPE literature.

Cite this Paper

BibTeX

@InProceedings{pmlr-v202-xie23d,
  title = 	 {Semiparametrically Efficient Off-Policy Evaluation in Linear {M}arkov Decision Processes},
  author =       {Xie, Chuhan and Yang, Wenhao and Zhang, Zhihua},
  booktitle = 	 {Proceedings of the 40th International Conference on Machine Learning},
  pages = 	 {38227--38257},
  year = 	 {2023},
  editor = 	 {Krause, Andreas and Brunskill, Emma and Cho, Kyunghyun and Engelhardt, Barbara and Sabato, Sivan and Scarlett, Jonathan},
  volume = 	 {202},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {23--29 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v202/xie23d/xie23d.pdf},
  url = 	 {https://proceedings.mlr.press/v202/xie23d.html},
  abstract = 	 {We study semiparametrically efficient estimation in off-policy evaluation (OPE) where the underlying Markov decision process (MDP) is linear with a known feature map. We characterize the variance lower bound for regular estimators in the linear MDP setting and propose an efficient estimator whose variance achieves that lower bound. Consistency and asymptotic normality of our estimator are established under mild conditions, which merely requires the only infinite-dimensional nuisance parameter to be estimated at a $n^{-1/4}$ convergence rate. We also construct an asymptotically valid confidence interval for statistical inference and conduct simulation studies to validate our results. To our knowledge, this is the first work that concerns efficient estimation in the presence of a known structure of MDPs in the OPE literature.}
}

Endnote

%0 Conference Paper
%T Semiparametrically Efficient Off-Policy Evaluation in Linear Markov Decision Processes
%A Chuhan Xie
%A Wenhao Yang
%A Zhihua Zhang
%B Proceedings of the 40th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2023
%E Andreas Krause
%E Emma Brunskill
%E Kyunghyun Cho
%E Barbara Engelhardt
%E Sivan Sabato
%E Jonathan Scarlett	
%F pmlr-v202-xie23d
%I PMLR
%P 38227--38257
%U https://proceedings.mlr.press/v202/xie23d.html
%V 202
%X We study semiparametrically efficient estimation in off-policy evaluation (OPE) where the underlying Markov decision process (MDP) is linear with a known feature map. We characterize the variance lower bound for regular estimators in the linear MDP setting and propose an efficient estimator whose variance achieves that lower bound. Consistency and asymptotic normality of our estimator are established under mild conditions, which merely requires the only infinite-dimensional nuisance parameter to be estimated at a $n^{-1/4}$ convergence rate. We also construct an asymptotically valid confidence interval for statistical inference and conduct simulation studies to validate our results. To our knowledge, this is the first work that concerns efficient estimation in the presence of a known structure of MDPs in the OPE literature.

APA

Xie, C., Yang, W. & Zhang, Z.. (2023). Semiparametrically Efficient Off-Policy Evaluation in Linear Markov Decision Processes. Proceedings of the 40th International Conference on Machine Learning, in Proceedings of Machine Learning Research 202:38227-38257 Available from https://proceedings.mlr.press/v202/xie23d.html.

Semiparametrically Efficient Off-Policy Evaluation in Linear Markov Decision Processes

Abstract

Cite this Paper

Related Material