Clinician-in-the-Loop Decision Making: Reinforcement Learning with Near-Optimal Set-Valued Policies

Shengpu Tang, Aditya Modi, Michael Sjoding, Jenna Wiens
Proceedings of the 37th International Conference on Machine Learning, PMLR 119:9387-9396, 2020.

Abstract

Standard reinforcement learning (RL) aims to find an optimal policy that identifies the best action for each state. However, in healthcare settings, many actions may be near-equivalent with respect to the reward (e.g., survival). We consider an alternative objective – learning set-valued policies to capture near-equivalent actions that lead to similar cumulative rewards. We propose a model-free algorithm based on temporal difference learning and a near-greedy heuristic for action selection. We analyze the theoretical properties of the proposed algorithm, providing optimality guarantees and demonstrate our approach on simulated environments and a real clinical task. Empirically, the proposed algorithm exhibits good convergence properties and discovers meaningful near-equivalent actions. Our work provides theoretical, as well as practical, foundations for clinician/human-in-the-loop decision making, in which humans (e.g., clinicians, patients) can incorporate additional knowledge (e.g., side effects, patient preference) when selecting among near-equivalent actions.

Cite this Paper


BibTeX
@InProceedings{pmlr-v119-tang20c, title = {Clinician-in-the-Loop Decision Making: Reinforcement Learning with Near-Optimal Set-Valued Policies}, author = {Tang, Shengpu and Modi, Aditya and Sjoding, Michael and Wiens, Jenna}, booktitle = {Proceedings of the 37th International Conference on Machine Learning}, pages = {9387--9396}, year = {2020}, editor = {III, Hal Daumé and Singh, Aarti}, volume = {119}, series = {Proceedings of Machine Learning Research}, month = {13--18 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v119/tang20c/tang20c.pdf}, url = {https://proceedings.mlr.press/v119/tang20c.html}, abstract = {Standard reinforcement learning (RL) aims to find an optimal policy that identifies the best action for each state. However, in healthcare settings, many actions may be near-equivalent with respect to the reward (e.g., survival). We consider an alternative objective – learning set-valued policies to capture near-equivalent actions that lead to similar cumulative rewards. We propose a model-free algorithm based on temporal difference learning and a near-greedy heuristic for action selection. We analyze the theoretical properties of the proposed algorithm, providing optimality guarantees and demonstrate our approach on simulated environments and a real clinical task. Empirically, the proposed algorithm exhibits good convergence properties and discovers meaningful near-equivalent actions. Our work provides theoretical, as well as practical, foundations for clinician/human-in-the-loop decision making, in which humans (e.g., clinicians, patients) can incorporate additional knowledge (e.g., side effects, patient preference) when selecting among near-equivalent actions.} }
Endnote
%0 Conference Paper %T Clinician-in-the-Loop Decision Making: Reinforcement Learning with Near-Optimal Set-Valued Policies %A Shengpu Tang %A Aditya Modi %A Michael Sjoding %A Jenna Wiens %B Proceedings of the 37th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2020 %E Hal Daumé III %E Aarti Singh %F pmlr-v119-tang20c %I PMLR %P 9387--9396 %U https://proceedings.mlr.press/v119/tang20c.html %V 119 %X Standard reinforcement learning (RL) aims to find an optimal policy that identifies the best action for each state. However, in healthcare settings, many actions may be near-equivalent with respect to the reward (e.g., survival). We consider an alternative objective – learning set-valued policies to capture near-equivalent actions that lead to similar cumulative rewards. We propose a model-free algorithm based on temporal difference learning and a near-greedy heuristic for action selection. We analyze the theoretical properties of the proposed algorithm, providing optimality guarantees and demonstrate our approach on simulated environments and a real clinical task. Empirically, the proposed algorithm exhibits good convergence properties and discovers meaningful near-equivalent actions. Our work provides theoretical, as well as practical, foundations for clinician/human-in-the-loop decision making, in which humans (e.g., clinicians, patients) can incorporate additional knowledge (e.g., side effects, patient preference) when selecting among near-equivalent actions.
APA
Tang, S., Modi, A., Sjoding, M. & Wiens, J.. (2020). Clinician-in-the-Loop Decision Making: Reinforcement Learning with Near-Optimal Set-Valued Policies. Proceedings of the 37th International Conference on Machine Learning, in Proceedings of Machine Learning Research 119:9387-9396 Available from https://proceedings.mlr.press/v119/tang20c.html.

Related Material