Using Reinforcement Learning for Multi-Objective Cluster-Level Optimization of Non-Pharmaceutical Interventions for Infectious Disease

Xueqiao Peng, Jiaqi Xu, Xi Chen, Dinh Song An Nguyen, Andrew Perrault
Proceedings of the 3rd Machine Learning for Health Symposium, PMLR 225:445-460, 2023.

Abstract

In the early stages of an infectious disease crisis, non-pharmaceutical interventions (NPIs) such as quarantines and testing can play an important role. Optimizing the delivery of NPIs is challenging as they can impose substantial direct costs (e.g., test costs) and human impacts (e.g., quarantine of uninfected individuals) and can be especially difficult to target for infections that may spread pre- or asymptomatically. %and infections may spread pre- or asymptomatically, leading to a multi-objective, partially observable problem. In addition, superspreading, a common characteristic of many infectious diseases, induces informational dependencies across a cluster (group of individuals exposed by the same seed case). We formulate NPI optimization as a partially observable Markov decision process (POMDP), which we aim to solve with reinforcement learning (RL). We find RL provides a promising technical foundation that even modern approaches struggle. We propose a novel RL approach that leverages a supervised learning decoder as well as permutation invariant, fixed-size observation representations. Through extensive experimentation and evaluation, we show that our optimized policy can outperform all benchmarks by up to 27 {\}% . %Our model can achieve up 60{\}% and 77{\}% improvement compared with non-action policy and CDC policy, respectively. Additionally, we show that the policies discovered by RL can be distilled into decision trees to simplify deployment while still achieving strong performance. We publicly release our code and RL environments at: {~} https://github.com/XueqiaoPeng/Covid-RLSL

Cite this Paper


BibTeX
@InProceedings{pmlr-v225-peng23a, title = {Using Reinforcement Learning for Multi-Objective Cluster-Level Optimization of Non-Pharmaceutical Interventions for Infectious Disease}, author = {Peng, Xueqiao and Xu, Jiaqi and Chen, Xi and Nguyen, Dinh Song An and Perrault, Andrew}, booktitle = {Proceedings of the 3rd Machine Learning for Health Symposium}, pages = {445--460}, year = {2023}, editor = {Hegselmann, Stefan and Parziale, Antonio and Shanmugam, Divya and Tang, Shengpu and Asiedu, Mercy Nyamewaa and Chang, Serina and Hartvigsen, Tom and Singh, Harvineet}, volume = {225}, series = {Proceedings of Machine Learning Research}, month = {10 Dec}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v225/peng23a/peng23a.pdf}, url = {https://proceedings.mlr.press/v225/peng23a.html}, abstract = {In the early stages of an infectious disease crisis, non-pharmaceutical interventions (NPIs) such as quarantines and testing can play an important role. Optimizing the delivery of NPIs is challenging as they can impose substantial direct costs (e.g., test costs) and human impacts (e.g., quarantine of uninfected individuals) and can be especially difficult to target for infections that may spread pre- or asymptomatically. %and infections may spread pre- or asymptomatically, leading to a multi-objective, partially observable problem. In addition, superspreading, a common characteristic of many infectious diseases, induces informational dependencies across a cluster (group of individuals exposed by the same seed case). We formulate NPI optimization as a partially observable Markov decision process (POMDP), which we aim to solve with reinforcement learning (RL). We find RL provides a promising technical foundation that even modern approaches struggle. We propose a novel RL approach that leverages a supervised learning decoder as well as permutation invariant, fixed-size observation representations. Through extensive experimentation and evaluation, we show that our optimized policy can outperform all benchmarks by up to 27 {\}% . %Our model can achieve up 60{\}% and 77{\}% improvement compared with non-action policy and CDC policy, respectively. Additionally, we show that the policies discovered by RL can be distilled into decision trees to simplify deployment while still achieving strong performance. We publicly release our code and RL environments at: {~} https://github.com/XueqiaoPeng/Covid-RLSL} }
Endnote
%0 Conference Paper %T Using Reinforcement Learning for Multi-Objective Cluster-Level Optimization of Non-Pharmaceutical Interventions for Infectious Disease %A Xueqiao Peng %A Jiaqi Xu %A Xi Chen %A Dinh Song An Nguyen %A Andrew Perrault %B Proceedings of the 3rd Machine Learning for Health Symposium %C Proceedings of Machine Learning Research %D 2023 %E Stefan Hegselmann %E Antonio Parziale %E Divya Shanmugam %E Shengpu Tang %E Mercy Nyamewaa Asiedu %E Serina Chang %E Tom Hartvigsen %E Harvineet Singh %F pmlr-v225-peng23a %I PMLR %P 445--460 %U https://proceedings.mlr.press/v225/peng23a.html %V 225 %X In the early stages of an infectious disease crisis, non-pharmaceutical interventions (NPIs) such as quarantines and testing can play an important role. Optimizing the delivery of NPIs is challenging as they can impose substantial direct costs (e.g., test costs) and human impacts (e.g., quarantine of uninfected individuals) and can be especially difficult to target for infections that may spread pre- or asymptomatically. %and infections may spread pre- or asymptomatically, leading to a multi-objective, partially observable problem. In addition, superspreading, a common characteristic of many infectious diseases, induces informational dependencies across a cluster (group of individuals exposed by the same seed case). We formulate NPI optimization as a partially observable Markov decision process (POMDP), which we aim to solve with reinforcement learning (RL). We find RL provides a promising technical foundation that even modern approaches struggle. We propose a novel RL approach that leverages a supervised learning decoder as well as permutation invariant, fixed-size observation representations. Through extensive experimentation and evaluation, we show that our optimized policy can outperform all benchmarks by up to 27 {\}% . %Our model can achieve up 60{\}% and 77{\}% improvement compared with non-action policy and CDC policy, respectively. Additionally, we show that the policies discovered by RL can be distilled into decision trees to simplify deployment while still achieving strong performance. We publicly release our code and RL environments at: {~} https://github.com/XueqiaoPeng/Covid-RLSL
APA
Peng, X., Xu, J., Chen, X., Nguyen, D.S.A. & Perrault, A.. (2023). Using Reinforcement Learning for Multi-Objective Cluster-Level Optimization of Non-Pharmaceutical Interventions for Infectious Disease. Proceedings of the 3rd Machine Learning for Health Symposium, in Proceedings of Machine Learning Research 225:445-460 Available from https://proceedings.mlr.press/v225/peng23a.html.

Related Material