[edit]
Patient-Specific Deep Reinforcement Learning for Automatic Replanning in Head-and-Neck Cancer Proton Therapy
Proceedings of the 10th Machine Learning for Healthcare Conference, PMLR 298, 2025.
Abstract
Anatomical changes in head-and-neck cancer (HNC) patients during intensity-modulated proton therapy (IMPT) can shift the Bragg Peak of proton beams, risking tumor underdosing and organ-at-risk (OAR) overdosing. As a result, treatment replanning is often required to maintain clinically acceptable treatment quality. However, current manual replanning processes are often resource-intensive and time-consuming. In this work, we propose a patient-specific deep reinforcement learning (DRL) framework for automated IMPT replanning, with a reward-shaping mechanism based on a $150$-point plan quality score designed to handle the competing clinical objectives in radiotherapy planning. We formulate the planning process as an RL problem where agents learn high-dimensional control policies to adjust plan optimization priorities to maximize plan quality. Unlike population-based approaches, our framework trains personalized agents for each patient using their planning CT and augmented anatomies simulating anatomical changes (tumor progression and regression). This patient-specific approach leverages anatomical similarities along the treatment course, enabling effective plan adaptation. We implemented and compared two DRL algorithms, Deep Q-Network (DQN) and Proximal Policy Optimization (PPO), using dose-volume histograms (DVHs) as state representations and a $22$-dimensional action space of priority adjustments. Evaluation on five HNC patients using actual replanning CT data showed that both DRL agents improved initial plan scores from $120.63 \pm 21.40$ to $139.78 \pm 6.84$ (DQN) and $142.74 \pm 5.16$ (PPO), surpassing the replans manually generated by a human planner ($137.20 \pm 5.58$). Clinical validation confirms these improvements translate to better tumor coverage and OAR sparing across diverse anatomical changes. This work highlights the potential of DRL in addressing the geometric and dosimetric complexities of adaptive proton therapy, offering a promising solution for efficient offline adaptation and paving the way for online adaptive proton therapy.