Mechanistically Guided LoRA Improves Paraphrase Consistency in Medical Vision-Language Models

Binesh Sadanandan, Vahid Behzadan
Proceedings of the 7th Conference on Health, Inference, and Learning, PMLR 333:703-720, 2026.

Abstract

Medical vision-language models can give different yes or no answers to rephrasings of the same clinical question. We study this in MedGemma-4B using PSF-Med, which provides paraphrase pairs for systematic consistency evaluation on medical VQA. On MIMIC-CXR binary questions ($n=158$), the baseline flip rate is 14.6% and mean margin difference is 1.63 logits. We validate that Gemma Scope 2 Sparse Autoencoders (SAEs) transfer to MedGemma activations, achieving $R^2 \approx 0.997$ on both medical and general text ($n=100$ prompts each, $p<0.001$ for exceeding a 0.95 threshold). We then fine-tune Low-Rank Adaptation (LoRA) adapters with a combined loss that balances paraphrase consistency with answer accuracy. This combined approach prevents mode collapse that occurs with pure consistency training while reducing flip rate from 14.6% to 4.4% ($p=0.002$, two-proportion z-test) and margin difference from 1.63 to 0.33 (79.5% reduction). Accuracy remains stable at 84.2% baseline versus 82.3% after training (-1.9pp, not significant). On PadChest Balanced ($n=250$), flip rate drops from 13.6% to 7.8%, mean margin difference drops from 1.08 to 0.35 (67.9% reduction), and accuracy increases from 66.4% to 69.4%. A layer-range ablation shows that early layers reduce margin differences more than mechanistically selected middle layers.

Cite this Paper


BibTeX
@InProceedings{pmlr-v333-sadanandan26a, title = {Mechanistically Guided LoRA Improves Paraphrase Consistency in Medical Vision-Language Models}, author = {Sadanandan, Binesh and Behzadan, Vahid}, booktitle = {Proceedings of the 7th Conference on Health, Inference, and Learning}, pages = {703--720}, year = {2026}, editor = {Healey, Elizabeth and Fries, Jason and Pollard, Tom and Tang, Shengpu and Zink, Anna and Hartvigsen, Tom and Agrawal, Monica and Finlayson, Sam and Glicksberg, Benjamin and Beaulieu-Jones, Brett and Wang, Kai and Fontalvo, Daseyra and Sarker, Tasmie and Chen, Irene and Alsentzer, Emily}, volume = {333}, series = {Proceedings of Machine Learning Research}, month = {29--30 Jun}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v333/main/assets/sadanandan26a/sadanandan26a.pdf}, url = {https://proceedings.mlr.press/v333/sadanandan26a.html}, abstract = {Medical vision-language models can give different yes or no answers to rephrasings of the same clinical question. We study this in MedGemma-4B using PSF-Med, which provides paraphrase pairs for systematic consistency evaluation on medical VQA. On MIMIC-CXR binary questions ($n=158$), the baseline flip rate is 14.6% and mean margin difference is 1.63 logits. We validate that Gemma Scope 2 Sparse Autoencoders (SAEs) transfer to MedGemma activations, achieving $R^2 \approx 0.997$ on both medical and general text ($n=100$ prompts each, $p<0.001$ for exceeding a 0.95 threshold). We then fine-tune Low-Rank Adaptation (LoRA) adapters with a combined loss that balances paraphrase consistency with answer accuracy. This combined approach prevents mode collapse that occurs with pure consistency training while reducing flip rate from 14.6% to 4.4% ($p=0.002$, two-proportion z-test) and margin difference from 1.63 to 0.33 (79.5% reduction). Accuracy remains stable at 84.2% baseline versus 82.3% after training (-1.9pp, not significant). On PadChest Balanced ($n=250$), flip rate drops from 13.6% to 7.8%, mean margin difference drops from 1.08 to 0.35 (67.9% reduction), and accuracy increases from 66.4% to 69.4%. A layer-range ablation shows that early layers reduce margin differences more than mechanistically selected middle layers.} }
Endnote
%0 Conference Paper %T Mechanistically Guided LoRA Improves Paraphrase Consistency in Medical Vision-Language Models %A Binesh Sadanandan %A Vahid Behzadan %B Proceedings of the 7th Conference on Health, Inference, and Learning %C Proceedings of Machine Learning Research %D 2026 %E Elizabeth Healey %E Jason Fries %E Tom Pollard %E Shengpu Tang %E Anna Zink %E Tom Hartvigsen %E Monica Agrawal %E Sam Finlayson %E Benjamin Glicksberg %E Brett Beaulieu-Jones %E Kai Wang %E Daseyra Fontalvo %E Tasmie Sarker %E Irene Chen %E Emily Alsentzer %F pmlr-v333-sadanandan26a %I PMLR %P 703--720 %U https://proceedings.mlr.press/v333/sadanandan26a.html %V 333 %X Medical vision-language models can give different yes or no answers to rephrasings of the same clinical question. We study this in MedGemma-4B using PSF-Med, which provides paraphrase pairs for systematic consistency evaluation on medical VQA. On MIMIC-CXR binary questions ($n=158$), the baseline flip rate is 14.6% and mean margin difference is 1.63 logits. We validate that Gemma Scope 2 Sparse Autoencoders (SAEs) transfer to MedGemma activations, achieving $R^2 \approx 0.997$ on both medical and general text ($n=100$ prompts each, $p<0.001$ for exceeding a 0.95 threshold). We then fine-tune Low-Rank Adaptation (LoRA) adapters with a combined loss that balances paraphrase consistency with answer accuracy. This combined approach prevents mode collapse that occurs with pure consistency training while reducing flip rate from 14.6% to 4.4% ($p=0.002$, two-proportion z-test) and margin difference from 1.63 to 0.33 (79.5% reduction). Accuracy remains stable at 84.2% baseline versus 82.3% after training (-1.9pp, not significant). On PadChest Balanced ($n=250$), flip rate drops from 13.6% to 7.8%, mean margin difference drops from 1.08 to 0.35 (67.9% reduction), and accuracy increases from 66.4% to 69.4%. A layer-range ablation shows that early layers reduce margin differences more than mechanistically selected middle layers.
APA
Sadanandan, B. & Behzadan, V.. (2026). Mechanistically Guided LoRA Improves Paraphrase Consistency in Medical Vision-Language Models. Proceedings of the 7th Conference on Health, Inference, and Learning, in Proceedings of Machine Learning Research 333:703-720 Available from https://proceedings.mlr.press/v333/sadanandan26a.html.

Related Material