Position: Reinforcement Learning in Dynamic Treatment Regimes Needs Critical Reexamination

Zhiyao Luo; Yangchen Pan; Peter Watkinson; Tingting Zhu

Position: Reinforcement Learning in Dynamic Treatment Regimes Needs Critical Reexamination

Zhiyao Luo, Yangchen Pan, Peter Watkinson, Tingting Zhu

Proceedings of the 41st International Conference on Machine Learning, PMLR 235:33432-33465, 2024.

Abstract

In the rapidly changing healthcare landscape, the implementation of offline reinforcement learning (RL) in dynamic treatment regimes (DTRs) presents a mix of unprecedented opportunities and challenges. This position paper offers a critical examination of the current status of offline RL in the context of DTRs. We argue for a reassessment of applying RL in DTRs, citing concerns such as inconsistent and potentially inconclusive evaluation metrics, the absence of naive and supervised learning baselines, and the diverse choice of RL formulation in existing research. Through a case study with more than 17,000 evaluation experiments using a publicly available Sepsis dataset, we demonstrate that the performance of RL algorithms can significantly vary with changes in evaluation metrics and Markov Decision Process (MDP) formulations. Surprisingly, it is observed that in some instances, RL algorithms can be surpassed by random baselines subjected to policy evaluation methods and reward design. This calls for more careful policy evaluation and algorithm development in future DTR works. Additionally, we discussed potential enhancements toward more reliable development of RL-based dynamic treatment regimes and invited further discussion within the community. Code is available at https://github.com/GilesLuo/ReassessDTR.

Cite this Paper

BibTeX


@InProceedings{pmlr-v235-luo24f,
  title = 	 {Position: Reinforcement Learning in Dynamic Treatment Regimes Needs Critical Reexamination},
  author =       {Luo, Zhiyao and Pan, Yangchen and Watkinson, Peter and Zhu, Tingting},
  booktitle = 	 {Proceedings of the 41st International Conference on Machine Learning},
  pages = 	 {33432--33465},
  year = 	 {2024},
  editor = 	 {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix},
  volume = 	 {235},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {21--27 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v235/main/assets/luo24f/luo24f.pdf},
  url = 	 {https://proceedings.mlr.press/v235/luo24f.html},
  abstract = 	 {In the rapidly changing healthcare landscape, the implementation of offline reinforcement learning (RL) in dynamic treatment regimes (DTRs) presents a mix of unprecedented opportunities and challenges. This position paper offers a critical examination of the current status of offline RL in the context of DTRs. We argue for a reassessment of applying RL in DTRs, citing concerns such as inconsistent and potentially inconclusive evaluation metrics, the absence of naive and supervised learning baselines, and the diverse choice of RL formulation in existing research. Through a case study with more than 17,000 evaluation experiments using a publicly available Sepsis dataset, we demonstrate that the performance of RL algorithms can significantly vary with changes in evaluation metrics and Markov Decision Process (MDP) formulations. Surprisingly, it is observed that in some instances, RL algorithms can be surpassed by random baselines subjected to policy evaluation methods and reward design. This calls for more careful policy evaluation and algorithm development in future DTR works. Additionally, we discussed potential enhancements toward more reliable development of RL-based dynamic treatment regimes and invited further discussion within the community. Code is available at https://github.com/GilesLuo/ReassessDTR.}
}

Endnote

%0 Conference Paper
%T Position: Reinforcement Learning in Dynamic Treatment Regimes Needs Critical Reexamination
%A Zhiyao Luo
%A Yangchen Pan
%A Peter Watkinson
%A Tingting Zhu
%B Proceedings of the 41st International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2024
%E Ruslan Salakhutdinov
%E Zico Kolter
%E Katherine Heller
%E Adrian Weller
%E Nuria Oliver
%E Jonathan Scarlett
%E Felix Berkenkamp	
%F pmlr-v235-luo24f
%I PMLR
%P 33432--33465
%U https://proceedings.mlr.press/v235/luo24f.html
%V 235
%X In the rapidly changing healthcare landscape, the implementation of offline reinforcement learning (RL) in dynamic treatment regimes (DTRs) presents a mix of unprecedented opportunities and challenges. This position paper offers a critical examination of the current status of offline RL in the context of DTRs. We argue for a reassessment of applying RL in DTRs, citing concerns such as inconsistent and potentially inconclusive evaluation metrics, the absence of naive and supervised learning baselines, and the diverse choice of RL formulation in existing research. Through a case study with more than 17,000 evaluation experiments using a publicly available Sepsis dataset, we demonstrate that the performance of RL algorithms can significantly vary with changes in evaluation metrics and Markov Decision Process (MDP) formulations. Surprisingly, it is observed that in some instances, RL algorithms can be surpassed by random baselines subjected to policy evaluation methods and reward design. This calls for more careful policy evaluation and algorithm development in future DTR works. Additionally, we discussed potential enhancements toward more reliable development of RL-based dynamic treatment regimes and invited further discussion within the community. Code is available at https://github.com/GilesLuo/ReassessDTR.

APA


Luo, Z., Pan, Y., Watkinson, P. & Zhu, T.. (2024). Position: Reinforcement Learning in Dynamic Treatment Regimes Needs Critical Reexamination. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:33432-33465 Available from https://proceedings.mlr.press/v235/luo24f.html.

Position: Reinforcement Learning in Dynamic Treatment Regimes Needs Critical Reexamination

Abstract

Cite this Paper

Related Material