Does the Markov Decision Process Fit the Data: Testing for the Markov Property in Sequential Decision Making

Chengchun Shi; Runzhe Wan; Rui Song; Wenbin Lu; Ling Leng

Does the Markov Decision Process Fit the Data: Testing for the Markov Property in Sequential Decision Making

Chengchun Shi, Runzhe Wan, Rui Song, Wenbin Lu, Ling Leng

Proceedings of the 37th International Conference on Machine Learning, PMLR 119:8807-8817, 2020.

Abstract

The Markov assumption (MA) is fundamental to the empirical validity of reinforcement learning. In this paper, we propose a novel Forward-Backward Learning procedure to test MA in sequential decision making. The proposed test does not assume any parametric form on the joint distribution of the observed data and plays an important role for identifying the optimal policy in high-order Markov decision processes (MDPs) and partially observable MDPs. Theoretically, we establish the validity of our test. Empirically, we apply our test to both synthetic datasets and a real data example from mobile health studies to illustrate its usefulness.

Cite this Paper

BibTeX


@InProceedings{pmlr-v119-shi20c,
  title = 	 {Does the {M}arkov Decision Process Fit the Data: Testing for the {M}arkov Property in Sequential Decision Making},
  author =       {Shi, Chengchun and Wan, Runzhe and Song, Rui and Lu, Wenbin and Leng, Ling},
  booktitle = 	 {Proceedings of the 37th International Conference on Machine Learning},
  pages = 	 {8807--8817},
  year = 	 {2020},
  editor = 	 {III, Hal Daumé and Singh, Aarti},
  volume = 	 {119},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {13--18 Jul},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v119/shi20c/shi20c.pdf},
  url = 	 {https://proceedings.mlr.press/v119/shi20c.html},
  abstract = 	 {The Markov assumption (MA) is fundamental to the empirical validity of reinforcement learning. In this paper, we propose a novel Forward-Backward Learning procedure to test MA in sequential decision making. The proposed test does not assume any parametric form on the joint distribution of the observed data and plays an important role for identifying the optimal policy in high-order Markov decision processes (MDPs) and partially observable MDPs. Theoretically, we establish the validity of our test. Empirically, we apply our test to both synthetic datasets and a real data example from mobile health studies to illustrate its usefulness.}
}

Endnote

%0 Conference Paper
%T Does the Markov Decision Process Fit the Data: Testing for the Markov Property in Sequential Decision Making
%A Chengchun Shi
%A Runzhe Wan
%A Rui Song
%A Wenbin Lu
%A Ling Leng
%B Proceedings of the 37th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2020
%E Hal Daumé III
%E Aarti Singh	
%F pmlr-v119-shi20c
%I PMLR
%P 8807--8817
%U https://proceedings.mlr.press/v119/shi20c.html
%V 119
%X The Markov assumption (MA) is fundamental to the empirical validity of reinforcement learning. In this paper, we propose a novel Forward-Backward Learning procedure to test MA in sequential decision making. The proposed test does not assume any parametric form on the joint distribution of the observed data and plays an important role for identifying the optimal policy in high-order Markov decision processes (MDPs) and partially observable MDPs. Theoretically, we establish the validity of our test. Empirically, we apply our test to both synthetic datasets and a real data example from mobile health studies to illustrate its usefulness.

APA


Shi, C., Wan, R., Song, R., Lu, W. & Leng, L.. (2020). Does the Markov Decision Process Fit the Data: Testing for the Markov Property in Sequential Decision Making. Proceedings of the 37th International Conference on Machine Learning, in Proceedings of Machine Learning Research 119:8807-8817 Available from https://proceedings.mlr.press/v119/shi20c.html.

Does the Markov Decision Process Fit the Data: Testing for the Markov Property in Sequential Decision Making

Abstract

Cite this Paper

Related Material