Mechanistically Guided LoRA Improves Paraphrase Consistency in Medical Vision-Language Models

Binesh Sadanandan; Vahid Behzadan

Mechanistically Guided LoRA Improves Paraphrase Consistency in Medical Vision-Language Models

Binesh Sadanandan, Vahid Behzadan

Proceedings of the 7th Conference on Health, Inference, and Learning, PMLR 333:703-720, 2026.

Abstract

Medical vision-language models can give different yes or no answers to rephrasings of the same clinical question. We study this in MedGemma-4B using PSF-Med, which provides paraphrase pairs for systematic consistency evaluation on medical VQA. On MIMIC-CXR binary questions ($n=158$), the baseline flip rate is 14.6% and mean margin difference is 1.63 logits. We validate that Gemma Scope 2 Sparse Autoencoders (SAEs) transfer to MedGemma activations, achieving $R^2 \approx 0.997$ on both medical and general text ($n=100$ prompts each, $p<0.001$ for exceeding a 0.95 threshold). We then fine-tune Low-Rank Adaptation (LoRA) adapters with a combined loss that balances paraphrase consistency with answer accuracy. This combined approach prevents mode collapse that occurs with pure consistency training while reducing flip rate from 14.6% to 4.4% ($p=0.002$, two-proportion z-test) and margin difference from 1.63 to 0.33 (79.5% reduction). Accuracy remains stable at 84.2% baseline versus 82.3% after training (-1.9pp, not significant). On PadChest Balanced ($n=250$), flip rate drops from 13.6% to 7.8%, mean margin difference drops from 1.08 to 0.35 (67.9% reduction), and accuracy increases from 66.4% to 69.4%. A layer-range ablation shows that early layers reduce margin differences more than mechanistically selected middle layers.

Cite this Paper

BibTeX

@InProceedings{pmlr-v333-sadanandan26a,
  title = 	 {Mechanistically Guided LoRA Improves Paraphrase Consistency in Medical Vision-Language Models},
  author =       {Sadanandan, Binesh and Behzadan, Vahid},
  booktitle = 	 {Proceedings of the 7th Conference on Health, Inference, and Learning},
  pages = 	 {703--720},
  year = 	 {2026},
  editor = 	 {Healey, Elizabeth and Fries, Jason and Pollard, Tom and Tang, Shengpu and Zink, Anna and Hartvigsen, Tom and Agrawal, Monica and Finlayson, Sam and Glicksberg, Benjamin and Beaulieu-Jones, Brett and Wang, Kai and Fontalvo, Daseyra and Sarker, Tasmie and Chen, Irene and Alsentzer, Emily},
  volume = 	 {333},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {29--30 Jun},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v333/main/assets/sadanandan26a/sadanandan26a.pdf},
  url = 	 {https://proceedings.mlr.press/v333/sadanandan26a.html},
  abstract = 	 {Medical vision-language models can give different yes or no answers to rephrasings of the same clinical question. We study this in MedGemma-4B using PSF-Med, which provides paraphrase pairs for systematic consistency evaluation on medical VQA. On MIMIC-CXR binary questions ($n=158$), the baseline flip rate is 14.6% and mean margin difference is 1.63 logits. We validate that Gemma Scope 2 Sparse Autoencoders (SAEs) transfer to MedGemma activations, achieving $R^2 \approx 0.997$ on both medical and general text ($n=100$ prompts each, $p<0.001$ for exceeding a 0.95 threshold). We then fine-tune Low-Rank Adaptation (LoRA) adapters with a combined loss that balances paraphrase consistency with answer accuracy. This combined approach prevents mode collapse that occurs with pure consistency training while reducing flip rate from 14.6% to 4.4% ($p=0.002$, two-proportion z-test) and margin difference from 1.63 to 0.33 (79.5% reduction). Accuracy remains stable at 84.2% baseline versus 82.3% after training (-1.9pp, not significant). On PadChest Balanced ($n=250$), flip rate drops from 13.6% to 7.8%, mean margin difference drops from 1.08 to 0.35 (67.9% reduction), and accuracy increases from 66.4% to 69.4%. A layer-range ablation shows that early layers reduce margin differences more than mechanistically selected middle layers.}
}

Endnote

%0 Conference Paper
%T Mechanistically Guided LoRA Improves Paraphrase Consistency in Medical Vision-Language Models
%A Binesh Sadanandan
%A Vahid Behzadan
%B Proceedings of the 7th Conference on Health, Inference, and Learning
%C Proceedings of Machine Learning Research
%D 2026
%E Elizabeth Healey
%E Jason Fries
%E Tom Pollard
%E Shengpu Tang
%E Anna Zink
%E Tom Hartvigsen
%E Monica Agrawal
%E Sam Finlayson
%E Benjamin Glicksberg
%E Brett Beaulieu-Jones
%E Kai Wang
%E Daseyra Fontalvo
%E Tasmie Sarker
%E Irene Chen
%E Emily Alsentzer	
%F pmlr-v333-sadanandan26a
%I PMLR
%P 703--720
%U https://proceedings.mlr.press/v333/sadanandan26a.html
%V 333
%X Medical vision-language models can give different yes or no answers to rephrasings of the same clinical question. We study this in MedGemma-4B using PSF-Med, which provides paraphrase pairs for systematic consistency evaluation on medical VQA. On MIMIC-CXR binary questions ($n=158$), the baseline flip rate is 14.6% and mean margin difference is 1.63 logits. We validate that Gemma Scope 2 Sparse Autoencoders (SAEs) transfer to MedGemma activations, achieving $R^2 \approx 0.997$ on both medical and general text ($n=100$ prompts each, $p<0.001$ for exceeding a 0.95 threshold). We then fine-tune Low-Rank Adaptation (LoRA) adapters with a combined loss that balances paraphrase consistency with answer accuracy. This combined approach prevents mode collapse that occurs with pure consistency training while reducing flip rate from 14.6% to 4.4% ($p=0.002$, two-proportion z-test) and margin difference from 1.63 to 0.33 (79.5% reduction). Accuracy remains stable at 84.2% baseline versus 82.3% after training (-1.9pp, not significant). On PadChest Balanced ($n=250$), flip rate drops from 13.6% to 7.8%, mean margin difference drops from 1.08 to 0.35 (67.9% reduction), and accuracy increases from 66.4% to 69.4%. A layer-range ablation shows that early layers reduce margin differences more than mechanistically selected middle layers.

APA

Sadanandan, B. & Behzadan, V.. (2026). Mechanistically Guided LoRA Improves Paraphrase Consistency in Medical Vision-Language Models. Proceedings of the 7th Conference on Health, Inference, and Learning, in Proceedings of Machine Learning Research 333:703-720 Available from https://proceedings.mlr.press/v333/sadanandan26a.html.

Related Material

Download PDF