Towards Rationale-Answer Alignment of LVLMs via Self-Rationale Calibration

Yuanchen Wu, Ke Yan, Shouhong Ding, Ziyin Zhou, Xiaoqiang Li
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:67938-67958, 2025.

Abstract

Large Vision-Language Models (LVLMs) have manifested strong visual question answering capability. However, they still struggle with aligning the rationale and the generated answer, leading to inconsistent reasoning and incorrect responses. To this end, this paper introduces Self-Rationale Calibration (SRC) framework to iteratively calibrate the alignment between rationales and answers. SRC begins by employing a lightweight “rationale fine-tuning” approach, which modifies the model’s response format to require a rationale before deriving answer without explicit prompts. Next, SRC searches a diverse set of candidate responses from the fine-tuned LVLMs for each sample, followed by a proposed pairwise scoring strategy using a tailored scoring model, R-Scorer, to evaluate both rationale quality and factual consistency of candidates. Based on a confidence-weighted preference curation process, SRC decouples the alignment calibration into a preference fine-tuning manner, leading to significant improvements of LVLMs in perception, reasoning, and generalization across multiple benchmarks. Our results emphasize the rationale-oriented alignment in exploring the potential of LVLMs.

Cite this Paper


BibTeX
@InProceedings{pmlr-v267-wu25am, title = {Towards Rationale-Answer Alignment of {LVLM}s via Self-Rationale Calibration}, author = {Wu, Yuanchen and Yan, Ke and Ding, Shouhong and Zhou, Ziyin and Li, Xiaoqiang}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, pages = {67938--67958}, year = {2025}, editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry}, volume = {267}, series = {Proceedings of Machine Learning Research}, month = {13--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/wu25am/wu25am.pdf}, url = {https://proceedings.mlr.press/v267/wu25am.html}, abstract = {Large Vision-Language Models (LVLMs) have manifested strong visual question answering capability. However, they still struggle with aligning the rationale and the generated answer, leading to inconsistent reasoning and incorrect responses. To this end, this paper introduces Self-Rationale Calibration (SRC) framework to iteratively calibrate the alignment between rationales and answers. SRC begins by employing a lightweight “rationale fine-tuning” approach, which modifies the model’s response format to require a rationale before deriving answer without explicit prompts. Next, SRC searches a diverse set of candidate responses from the fine-tuned LVLMs for each sample, followed by a proposed pairwise scoring strategy using a tailored scoring model, R-Scorer, to evaluate both rationale quality and factual consistency of candidates. Based on a confidence-weighted preference curation process, SRC decouples the alignment calibration into a preference fine-tuning manner, leading to significant improvements of LVLMs in perception, reasoning, and generalization across multiple benchmarks. Our results emphasize the rationale-oriented alignment in exploring the potential of LVLMs.} }
Endnote
%0 Conference Paper %T Towards Rationale-Answer Alignment of LVLMs via Self-Rationale Calibration %A Yuanchen Wu %A Ke Yan %A Shouhong Ding %A Ziyin Zhou %A Xiaoqiang Li %B Proceedings of the 42nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Aarti Singh %E Maryam Fazel %E Daniel Hsu %E Simon Lacoste-Julien %E Felix Berkenkamp %E Tegan Maharaj %E Kiri Wagstaff %E Jerry Zhu %F pmlr-v267-wu25am %I PMLR %P 67938--67958 %U https://proceedings.mlr.press/v267/wu25am.html %V 267 %X Large Vision-Language Models (LVLMs) have manifested strong visual question answering capability. However, they still struggle with aligning the rationale and the generated answer, leading to inconsistent reasoning and incorrect responses. To this end, this paper introduces Self-Rationale Calibration (SRC) framework to iteratively calibrate the alignment between rationales and answers. SRC begins by employing a lightweight “rationale fine-tuning” approach, which modifies the model’s response format to require a rationale before deriving answer without explicit prompts. Next, SRC searches a diverse set of candidate responses from the fine-tuned LVLMs for each sample, followed by a proposed pairwise scoring strategy using a tailored scoring model, R-Scorer, to evaluate both rationale quality and factual consistency of candidates. Based on a confidence-weighted preference curation process, SRC decouples the alignment calibration into a preference fine-tuning manner, leading to significant improvements of LVLMs in perception, reasoning, and generalization across multiple benchmarks. Our results emphasize the rationale-oriented alignment in exploring the potential of LVLMs.
APA
Wu, Y., Yan, K., Ding, S., Zhou, Z. & Li, X.. (2025). Towards Rationale-Answer Alignment of LVLMs via Self-Rationale Calibration. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:67938-67958 Available from https://proceedings.mlr.press/v267/wu25am.html.

Related Material