Reinforcement Learning with Action-Derived Rewards for Chemotherapy and Clinical Trial Dosing Regimen Selection

Gregory Yauney, Pratik Shah
; Proceedings of the 3rd Machine Learning for Healthcare Conference, PMLR 85:161-226, 2018.

Abstract

Unstructured learning problems without well-defined rewards are unsuitable for current reinforcement learning (RL) approaches. Action-derived rewards can allow RL agents to fully explore state and action trade-offs in scenarios that require specific outcomes yet are unstructured by external reward. Clinical trial dosing choice is an example of such a problem. We report the successful formulation of clinical trial dosing choice as an RL problem using action-based rewards and learning of dosing regimens to reduce mean tumor diameters (MTD) in patients undergoing simulated temozolomide (TMZ) and procarbazine, 1-(2-chloroethyl)-3-cyclohexyl-l-nitrosourea, and vincristine (PCV) chemo- and radiotherapy clinical trials. The use of action-derived rewards as partial proxies for outcomes is described for the first time. Novel dosing regimens learned by an RL agent in the presence of action-derived rewards achieve significant reduction in MTD for cohorts and individual patients in simulated TMZ and PCV clinical trials while reducing treatment cycle administrations and dosage concentrations compared to human-expert dosing regimens. Our approach can be easily adapted for other learning tasks where outcome-based learning is not practical.

Cite this Paper


BibTeX
@InProceedings{pmlr-v85-yauney18a, title = {Reinforcement Learning with Action-Derived Rewards for Chemotherapy and Clinical Trial Dosing Regimen Selection}, author = {Yauney, Gregory and Shah, Pratik}, booktitle = {Proceedings of the 3rd Machine Learning for Healthcare Conference}, pages = {161--226}, year = {2018}, editor = {Finale Doshi-Velez and Jim Fackler and Ken Jung and David Kale and Rajesh Ranganath and Byron Wallace and Jenna Wiens}, volume = {85}, series = {Proceedings of Machine Learning Research}, address = {Palo Alto, California}, month = {17--18 Aug}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v85/yauney18a/yauney18a.pdf}, url = {http://proceedings.mlr.press/v85/yauney18a.html}, abstract = {Unstructured learning problems without well-defined rewards are unsuitable for current reinforcement learning (RL) approaches. Action-derived rewards can allow RL agents to fully explore state and action trade-offs in scenarios that require specific outcomes yet are unstructured by external reward. Clinical trial dosing choice is an example of such a problem. We report the successful formulation of clinical trial dosing choice as an RL problem using action-based rewards and learning of dosing regimens to reduce mean tumor diameters (MTD) in patients undergoing simulated temozolomide (TMZ) and procarbazine, 1-(2-chloroethyl)-3-cyclohexyl-l-nitrosourea, and vincristine (PCV) chemo- and radiotherapy clinical trials. The use of action-derived rewards as partial proxies for outcomes is described for the first time. Novel dosing regimens learned by an RL agent in the presence of action-derived rewards achieve significant reduction in MTD for cohorts and individual patients in simulated TMZ and PCV clinical trials while reducing treatment cycle administrations and dosage concentrations compared to human-expert dosing regimens. Our approach can be easily adapted for other learning tasks where outcome-based learning is not practical.} }
Endnote
%0 Conference Paper %T Reinforcement Learning with Action-Derived Rewards for Chemotherapy and Clinical Trial Dosing Regimen Selection %A Gregory Yauney %A Pratik Shah %B Proceedings of the 3rd Machine Learning for Healthcare Conference %C Proceedings of Machine Learning Research %D 2018 %E Finale Doshi-Velez %E Jim Fackler %E Ken Jung %E David Kale %E Rajesh Ranganath %E Byron Wallace %E Jenna Wiens %F pmlr-v85-yauney18a %I PMLR %J Proceedings of Machine Learning Research %P 161--226 %U http://proceedings.mlr.press %V 85 %W PMLR %X Unstructured learning problems without well-defined rewards are unsuitable for current reinforcement learning (RL) approaches. Action-derived rewards can allow RL agents to fully explore state and action trade-offs in scenarios that require specific outcomes yet are unstructured by external reward. Clinical trial dosing choice is an example of such a problem. We report the successful formulation of clinical trial dosing choice as an RL problem using action-based rewards and learning of dosing regimens to reduce mean tumor diameters (MTD) in patients undergoing simulated temozolomide (TMZ) and procarbazine, 1-(2-chloroethyl)-3-cyclohexyl-l-nitrosourea, and vincristine (PCV) chemo- and radiotherapy clinical trials. The use of action-derived rewards as partial proxies for outcomes is described for the first time. Novel dosing regimens learned by an RL agent in the presence of action-derived rewards achieve significant reduction in MTD for cohorts and individual patients in simulated TMZ and PCV clinical trials while reducing treatment cycle administrations and dosage concentrations compared to human-expert dosing regimens. Our approach can be easily adapted for other learning tasks where outcome-based learning is not practical.
APA
Yauney, G. & Shah, P.. (2018). Reinforcement Learning with Action-Derived Rewards for Chemotherapy and Clinical Trial Dosing Regimen Selection. Proceedings of the 3rd Machine Learning for Healthcare Conference, in PMLR 85:161-226

Related Material