Exploring Time-Step Size in Reinforcement Learning for Sepsis Treatment

Yingchuan Sun, Shengpu Tang
Proceedings of the Fifth Machine Learning for Health Symposium, PMLR 297:1235-1252, 2026.

Abstract

Existing studies on reinforcement learning ({RL}) for sepsis management have mostly followed an established problem setup, in which patient data are aggregated into 4-hour time steps. Although concerns have been raised regarding the coarseness of this time-step size, which might distort patient dynamics and lead to suboptimal treatment policies, the extent to which this is a problem in practice remains unexplored. In this work, we conducted empirical experiments for a controlled comparison of four time-step sizes ($\Delta t = 1, 2, 4, 8$ h) on this domain, following an identical offline {RL} pipeline. To enable a fair comparison across time-step sizes, we designed action re-mapping methods that allow for evaluation of policies on datasets with different time-step sizes, and conducted cross-$\Delta t$ model selections under two policy learning setups. Our goal was to quantify how time-step size influences state representation learning, behavior cloning, policy training, and off-policy evaluation. Our results show that performance trends across $\Delta t$ vary as learning setups change, while policies learned at finer time-step sizes ($\Delta t = 1$ h and 2 h) using a static behavior policy achieve the overall best performance and stability. Our work highlights time-step size as a core design choice in offline {RL} for healthcare and provides evidence supporting alternatives beyond the conventional 4-hour setup.

Cite this Paper


BibTeX
@InProceedings{pmlr-v297-sun26a, title = {Exploring Time-Step Size in Reinforcement Learning for Sepsis Treatment}, author = {Sun, Yingchuan and Tang, Shengpu}, booktitle = {Proceedings of the Fifth Machine Learning for Health Symposium}, pages = {1235--1252}, year = {2026}, editor = {Argaw, Peniel and Zhang, Haoran and Jabbour, Sarah and Chandak, Payal and Ji, Jerry and Mukherjee, Sumit and Salaudeen, Olawale and Chang, Trenton and Healey, Elizabeth and Gröger, Fabian and Adibi, Amin and Hegselmann, Stefan and Wild, Benjamin and Noori, Ayush}, volume = {297}, series = {Proceedings of Machine Learning Research}, month = {13--14 Dec}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v297/main/assets/sun26a/sun26a.pdf}, url = {https://proceedings.mlr.press/v297/sun26a.html}, abstract = {Existing studies on reinforcement learning ({RL}) for sepsis management have mostly followed an established problem setup, in which patient data are aggregated into 4-hour time steps. Although concerns have been raised regarding the coarseness of this time-step size, which might distort patient dynamics and lead to suboptimal treatment policies, the extent to which this is a problem in practice remains unexplored. In this work, we conducted empirical experiments for a controlled comparison of four time-step sizes ($\Delta t = 1, 2, 4, 8$ h) on this domain, following an identical offline {RL} pipeline. To enable a fair comparison across time-step sizes, we designed action re-mapping methods that allow for evaluation of policies on datasets with different time-step sizes, and conducted cross-$\Delta t$ model selections under two policy learning setups. Our goal was to quantify how time-step size influences state representation learning, behavior cloning, policy training, and off-policy evaluation. Our results show that performance trends across $\Delta t$ vary as learning setups change, while policies learned at finer time-step sizes ($\Delta t = 1$ h and 2 h) using a static behavior policy achieve the overall best performance and stability. Our work highlights time-step size as a core design choice in offline {RL} for healthcare and provides evidence supporting alternatives beyond the conventional 4-hour setup.} }
Endnote
%0 Conference Paper %T Exploring Time-Step Size in Reinforcement Learning for Sepsis Treatment %A Yingchuan Sun %A Shengpu Tang %B Proceedings of the Fifth Machine Learning for Health Symposium %C Proceedings of Machine Learning Research %D 2026 %E Peniel Argaw %E Haoran Zhang %E Sarah Jabbour %E Payal Chandak %E Jerry Ji %E Sumit Mukherjee %E Olawale Salaudeen %E Trenton Chang %E Elizabeth Healey %E Fabian Gröger %E Amin Adibi %E Stefan Hegselmann %E Benjamin Wild %E Ayush Noori %F pmlr-v297-sun26a %I PMLR %P 1235--1252 %U https://proceedings.mlr.press/v297/sun26a.html %V 297 %X Existing studies on reinforcement learning ({RL}) for sepsis management have mostly followed an established problem setup, in which patient data are aggregated into 4-hour time steps. Although concerns have been raised regarding the coarseness of this time-step size, which might distort patient dynamics and lead to suboptimal treatment policies, the extent to which this is a problem in practice remains unexplored. In this work, we conducted empirical experiments for a controlled comparison of four time-step sizes ($\Delta t = 1, 2, 4, 8$ h) on this domain, following an identical offline {RL} pipeline. To enable a fair comparison across time-step sizes, we designed action re-mapping methods that allow for evaluation of policies on datasets with different time-step sizes, and conducted cross-$\Delta t$ model selections under two policy learning setups. Our goal was to quantify how time-step size influences state representation learning, behavior cloning, policy training, and off-policy evaluation. Our results show that performance trends across $\Delta t$ vary as learning setups change, while policies learned at finer time-step sizes ($\Delta t = 1$ h and 2 h) using a static behavior policy achieve the overall best performance and stability. Our work highlights time-step size as a core design choice in offline {RL} for healthcare and provides evidence supporting alternatives beyond the conventional 4-hour setup.
APA
Sun, Y. & Tang, S.. (2026). Exploring Time-Step Size in Reinforcement Learning for Sepsis Treatment. Proceedings of the Fifth Machine Learning for Health Symposium, in Proceedings of Machine Learning Research 297:1235-1252 Available from https://proceedings.mlr.press/v297/sun26a.html.

Related Material