[edit]
Guardian-regularized Safe Offline Reinforcement Learning for Smart Weaning of Mechanical Circulatory Devices
Proceedings of the Fifth Machine Learning for Health Symposium, PMLR 297:1269-1296, 2026.
Abstract
We study the sequential decision-making problem for automated weaning of mechanical circulatory support ({MCS}) devices in cardiogenic shock patients. {MCS} devices are percutaneous micro-axial flow pumps that provide left ventricular unloading and forward blood flow, but current weaning strategies vary significantly across care teams and lack data-driven approaches. Offline reinforcement learning ({RL}) has proven to be successful in sequential decision-making tasks, but our setting presents challenges for training and evaluating traditional offline {RL} methods: prohibition of online patient interaction, highly uncertain circulatory dynamics due to concurrent treatments, and limited data availability. We developed an end-to-end machine learning framework with two key contributions (1) Clinically-aware OOD-regularized Model-based Policy Optimization ({CORMPO}) a density-regularized offline {RL} algorithm for out-of-distribution suppression that also incorporates clinically-informed reward shaping and (2) a Transformer-based probabilistic digital twin that models {MCS} circulatory dynamics for policy evaluation with rich physiological and clinical metrics. We prove that {CORMPO} achieves theoretical performance guarantees under mild assumptions. {CORMPO} attains a higher reward than the offline {RL} baselines by 28% and higher scores in clinical metrics by 82.6% on real and synthetic datasets. Our approach offers a principled framework for safe offline policy learning in high-stakes medical applications where domain expertise and safety constraints are essential.