Decision-Focused Model-based Reinforcement Learning for Reward Transfer

Abhishek Sharma, Sonali Parbhoo, Omer Gottesman, Finale Doshi-Velez
Proceedings of the 9th Machine Learning for Healthcare Conference, PMLR 252, 2024.

Abstract

Model-based reinforcement learning (MBRL) provides a way to learn a transition model of the environment, which can then be used to plan personalized policies for different patient cohorts, and to understand the dynamics involved in the decision-making process. However, standard MBRL algorithms are either sensitive to changes in the reward function or achieve suboptimal performance on the task when the transition model is restricted. Motivated by the need to use simple and interpretable models in critical domains such as healthcare, we propose a novel robust decision-focused (RDF) algorithm that learns a transition model that achieves high returns while being robust to changes in the reward function. We demonstrate our RDF algorithm can be used with several model classes and planning algorithms. We also provide theoretical and empirical envidence, on variety of simulators and real patient data, that RDF can learn simple yet effective models that can be used to plan personalized policies.

Cite this Paper


BibTeX
@InProceedings{pmlr-v252-sharma24a, title = {Decision-Focused Model-based Reinforcement Learning for Reward Transfer}, author = {Sharma, Abhishek and Parbhoo, Sonali and Gottesman, Omer and Doshi-Velez, Finale}, booktitle = {Proceedings of the 9th Machine Learning for Healthcare Conference}, year = {2024}, editor = {Deshpande, Kaivalya and Fiterau, Madalina and Joshi, Shalmali and Lipton, Zachary and Ranganath, Rajesh and Urteaga, Iñigo}, volume = {252}, series = {Proceedings of Machine Learning Research}, month = {16--17 Aug}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v252/main/assets/sharma24a/sharma24a.pdf}, url = {https://proceedings.mlr.press/v252/sharma24a.html}, abstract = {Model-based reinforcement learning (MBRL) provides a way to learn a transition model of the environment, which can then be used to plan personalized policies for different patient cohorts, and to understand the dynamics involved in the decision-making process. However, standard MBRL algorithms are either sensitive to changes in the reward function or achieve suboptimal performance on the task when the transition model is restricted. Motivated by the need to use simple and interpretable models in critical domains such as healthcare, we propose a novel robust decision-focused (RDF) algorithm that learns a transition model that achieves high returns while being robust to changes in the reward function. We demonstrate our RDF algorithm can be used with several model classes and planning algorithms. We also provide theoretical and empirical envidence, on variety of simulators and real patient data, that RDF can learn simple yet effective models that can be used to plan personalized policies.} }
Endnote
%0 Conference Paper %T Decision-Focused Model-based Reinforcement Learning for Reward Transfer %A Abhishek Sharma %A Sonali Parbhoo %A Omer Gottesman %A Finale Doshi-Velez %B Proceedings of the 9th Machine Learning for Healthcare Conference %C Proceedings of Machine Learning Research %D 2024 %E Kaivalya Deshpande %E Madalina Fiterau %E Shalmali Joshi %E Zachary Lipton %E Rajesh Ranganath %E Iñigo Urteaga %F pmlr-v252-sharma24a %I PMLR %U https://proceedings.mlr.press/v252/sharma24a.html %V 252 %X Model-based reinforcement learning (MBRL) provides a way to learn a transition model of the environment, which can then be used to plan personalized policies for different patient cohorts, and to understand the dynamics involved in the decision-making process. However, standard MBRL algorithms are either sensitive to changes in the reward function or achieve suboptimal performance on the task when the transition model is restricted. Motivated by the need to use simple and interpretable models in critical domains such as healthcare, we propose a novel robust decision-focused (RDF) algorithm that learns a transition model that achieves high returns while being robust to changes in the reward function. We demonstrate our RDF algorithm can be used with several model classes and planning algorithms. We also provide theoretical and empirical envidence, on variety of simulators and real patient data, that RDF can learn simple yet effective models that can be used to plan personalized policies.
APA
Sharma, A., Parbhoo, S., Gottesman, O. & Doshi-Velez, F.. (2024). Decision-Focused Model-based Reinforcement Learning for Reward Transfer. Proceedings of the 9th Machine Learning for Healthcare Conference, in Proceedings of Machine Learning Research 252 Available from https://proceedings.mlr.press/v252/sharma24a.html.

Related Material