Improving Medical Visual Reinforcement Fine-Tuning via Perception and Reasoning Augmentation

Guangjing Yang, ZhangYuan Yu, Ziyuan Qin, Xinyuan Song, Huahui Yi, Qingbo Kang, Jun Gao, Yiyue Li, Chenlin Du, Qicheng Lao
Conference on Parsimony and Learning, PMLR 328:24-41, 2026.

Abstract

While recent advances in Reinforcement Fine-Tuning (RFT) have shown that rule-based reward schemes can enable effective post-training for large language models, their extension to cross-modal, vision-centric domains remains largely underexplored. This limitation is especially pronounced in the medical imaging domain, where effective performance requires both robust visual perception and structured reasoning. In this work, we address this gap by proposing \textit{VRFT-Aug}, a visual reinforcement fine-tuning framework tailored for the medical domain. VRFT-Aug introduces a series of training strategies designed to augment both perception and reasoning, including prior knowledge injection, perception-driven policy refinement, medically informed reward shaping, and behavioral imitation. Together, these methods aim to stabilize and improve the RFT process. Through extensive experiments across multiple medical datasets, we show that our approaches consistently outperform both standard supervised fine-tuning and RFT baselines. Moreover, we provide empirically grounded insights and practical training heuristics that can be generalized to other medical image tasks. We hope this work contributes actionable guidance and fresh inspiration for the ongoing effort to develop reliable, reasoning-capable models for high-stakes medical applications.

Cite this Paper


BibTeX
@InProceedings{pmlr-v328-yang26a, title = {Improving Medical Visual Reinforcement Fine-Tuning via Perception and Reasoning Augmentation}, author = {Yang, Guangjing and Yu, ZhangYuan and Qin, Ziyuan and Song, Xinyuan and Yi, Huahui and Kang, Qingbo and Gao, Jun and Li, Yiyue and Du, Chenlin and Lao, Qicheng}, booktitle = {Conference on Parsimony and Learning}, pages = {24--41}, year = {2026}, editor = {Burkholz, Rebekka and Liu, Shiwei and Ravishankar, Saiprasad and Redman, William and Huang, Wei and Su, Weijie and Zhu, Zhihui}, volume = {328}, series = {Proceedings of Machine Learning Research}, month = {23--26 Mar}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v328/main/assets/yang26a/yang26a.pdf}, url = {https://proceedings.mlr.press/v328/yang26a.html}, abstract = {While recent advances in Reinforcement Fine-Tuning (RFT) have shown that rule-based reward schemes can enable effective post-training for large language models, their extension to cross-modal, vision-centric domains remains largely underexplored. This limitation is especially pronounced in the medical imaging domain, where effective performance requires both robust visual perception and structured reasoning. In this work, we address this gap by proposing \textit{VRFT-Aug}, a visual reinforcement fine-tuning framework tailored for the medical domain. VRFT-Aug introduces a series of training strategies designed to augment both perception and reasoning, including prior knowledge injection, perception-driven policy refinement, medically informed reward shaping, and behavioral imitation. Together, these methods aim to stabilize and improve the RFT process. Through extensive experiments across multiple medical datasets, we show that our approaches consistently outperform both standard supervised fine-tuning and RFT baselines. Moreover, we provide empirically grounded insights and practical training heuristics that can be generalized to other medical image tasks. We hope this work contributes actionable guidance and fresh inspiration for the ongoing effort to develop reliable, reasoning-capable models for high-stakes medical applications.} }
Endnote
%0 Conference Paper %T Improving Medical Visual Reinforcement Fine-Tuning via Perception and Reasoning Augmentation %A Guangjing Yang %A ZhangYuan Yu %A Ziyuan Qin %A Xinyuan Song %A Huahui Yi %A Qingbo Kang %A Jun Gao %A Yiyue Li %A Chenlin Du %A Qicheng Lao %B Conference on Parsimony and Learning %C Proceedings of Machine Learning Research %D 2026 %E Rebekka Burkholz %E Shiwei Liu %E Saiprasad Ravishankar %E William Redman %E Wei Huang %E Weijie Su %E Zhihui Zhu %F pmlr-v328-yang26a %I PMLR %P 24--41 %U https://proceedings.mlr.press/v328/yang26a.html %V 328 %X While recent advances in Reinforcement Fine-Tuning (RFT) have shown that rule-based reward schemes can enable effective post-training for large language models, their extension to cross-modal, vision-centric domains remains largely underexplored. This limitation is especially pronounced in the medical imaging domain, where effective performance requires both robust visual perception and structured reasoning. In this work, we address this gap by proposing \textit{VRFT-Aug}, a visual reinforcement fine-tuning framework tailored for the medical domain. VRFT-Aug introduces a series of training strategies designed to augment both perception and reasoning, including prior knowledge injection, perception-driven policy refinement, medically informed reward shaping, and behavioral imitation. Together, these methods aim to stabilize and improve the RFT process. Through extensive experiments across multiple medical datasets, we show that our approaches consistently outperform both standard supervised fine-tuning and RFT baselines. Moreover, we provide empirically grounded insights and practical training heuristics that can be generalized to other medical image tasks. We hope this work contributes actionable guidance and fresh inspiration for the ongoing effort to develop reliable, reasoning-capable models for high-stakes medical applications.
APA
Yang, G., Yu, Z., Qin, Z., Song, X., Yi, H., Kang, Q., Gao, J., Li, Y., Du, C. & Lao, Q.. (2026). Improving Medical Visual Reinforcement Fine-Tuning via Perception and Reasoning Augmentation. Conference on Parsimony and Learning, in Proceedings of Machine Learning Research 328:24-41 Available from https://proceedings.mlr.press/v328/yang26a.html.

Related Material