POPCORN: Partially Observed Prediction Constrained Reinforcement Learning

Joseph Futoma, Michael Hughes, Finale Doshi-Velez
Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, PMLR 108:3578-3588, 2020.

Abstract

Many medical decision-making tasks can be framed as partially observed Markov decision processes (POMDPs). However, prevailing two-stage approaches that first learn a POMDP and then solve it often fail because the model that best fits the data may not be well suited for planning. We introduce a new optimization objective that (a) produces both high-performing policies and high-quality generative models, even when some observations are irrelevant for planning, and (b) does so in batch off-policy settings that are typical in healthcare, when only retrospective data is available. We demonstrate our approach on synthetic examples and a challenging medical decision-making problem.

Cite this Paper


BibTeX
@InProceedings{pmlr-v108-futoma20a, title = {POPCORN: Partially Observed Prediction Constrained Reinforcement Learning}, author = {Futoma, Joseph and Hughes, Michael and Doshi-Velez, Finale}, booktitle = {Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics}, pages = {3578--3588}, year = {2020}, editor = {Chiappa, Silvia and Calandra, Roberto}, volume = {108}, series = {Proceedings of Machine Learning Research}, month = {26--28 Aug}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v108/futoma20a/futoma20a.pdf}, url = {https://proceedings.mlr.press/v108/futoma20a.html}, abstract = {Many medical decision-making tasks can be framed as partially observed Markov decision processes (POMDPs). However, prevailing two-stage approaches that first learn a POMDP and then solve it often fail because the model that best fits the data may not be well suited for planning. We introduce a new optimization objective that (a) produces both high-performing policies and high-quality generative models, even when some observations are irrelevant for planning, and (b) does so in batch off-policy settings that are typical in healthcare, when only retrospective data is available. We demonstrate our approach on synthetic examples and a challenging medical decision-making problem.} }
Endnote
%0 Conference Paper %T POPCORN: Partially Observed Prediction Constrained Reinforcement Learning %A Joseph Futoma %A Michael Hughes %A Finale Doshi-Velez %B Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2020 %E Silvia Chiappa %E Roberto Calandra %F pmlr-v108-futoma20a %I PMLR %P 3578--3588 %U https://proceedings.mlr.press/v108/futoma20a.html %V 108 %X Many medical decision-making tasks can be framed as partially observed Markov decision processes (POMDPs). However, prevailing two-stage approaches that first learn a POMDP and then solve it often fail because the model that best fits the data may not be well suited for planning. We introduce a new optimization objective that (a) produces both high-performing policies and high-quality generative models, even when some observations are irrelevant for planning, and (b) does so in batch off-policy settings that are typical in healthcare, when only retrospective data is available. We demonstrate our approach on synthetic examples and a challenging medical decision-making problem.
APA
Futoma, J., Hughes, M. & Doshi-Velez, F.. (2020). POPCORN: Partially Observed Prediction Constrained Reinforcement Learning. Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 108:3578-3588 Available from https://proceedings.mlr.press/v108/futoma20a.html.

Related Material