[edit]
Open-Ended Clinical Text Generation for Acute Care: Applying Reinforcement Learning with Clinically Grounded Rewards
Proceedings of the 7th Conference on Health, Inference, and Learning, PMLR 333:966-984, 2026.
Abstract
Acute care clinicians generate critical clinical text—diagnoses, treatment plans, discharge instructions—under time pressure where errors can be life-threatening. Large proprietary AI models raise privacy concerns, while smaller models lack clinical quality. We extend reinforcement learning with verifiable rewards (RLVR) to open-ended clinical text generation using two generalizable reward patterns: equivalence-based rewards for medical synonymy and diagnosis matching, as well as rubric-based rewards for multi-dimensional quality assessment. Using group relative policy optimization, we trained compact 7–8 billion parameter models on diagnosis generation (MIMIC-III), discharge instructions (DischargeMe), and treatment planning (MTSamples). Trained models achieve clinical quality across tasks (best results: F1 0.48, 4.28/5.0, 4.47/5.0 respectively), matching or surpassing the performance of large proprietary GPT-based models, while enabling on-premise deployment, sub-second inference, and full privacy. Physician review confirmed superior content comprehensiveness and fewer dangerous errors versus base models. This demonstrates a practical pathway for deploying clinical text generation in acute care with generalizable reward design patterns.