PatchPrune: Reducing Hallucinations in Vision Language Models by Pruning Redundant Image Patches

Changyan Liu

PatchPrune: Reducing Hallucinations in Vision Language Models by Pruning Redundant Image Patches

Changyan Liu

Proceedings of 2025 2nd International Conference on Machine Learning and Intelligent Computing, PMLR 278:298-304, 2025.

Abstract

Large language models (LLMs) have advanced significantly in natural language processing, and vision language models (VLMs) have extended this progress to tasks like image captioning and visual question answering (VQA). Despite this success, VLMs often generate hallucinated or factually inconsistent contents. Traditional methods focus on improving model reasoning by modifying the inference procedure, but we propose a new approach: PatchPrune, which dynamically prunes redundant or uninformative image patches, using a composite importance score based on activation magnitude and feature entropy. As shown in Figure By reducing input noise, PatchPrune enables the model to focus on relevant features, improving the accuracy and reliability of its outputs. Experimental results show that PatchPrune enhances multimodal reasoning and mitigates hallucinations effectively.

Cite this Paper

BibTeX

@InProceedings{pmlr-v278-liu25c,
  title = 	 {PatchPrune: Reducing Hallucinations in Vision Language Models by Pruning Redundant Image Patches},
  author =       {Liu, Changyan},
  booktitle = 	 {Proceedings of 2025 2nd International Conference on Machine Learning and Intelligent Computing},
  pages = 	 {298--304},
  year = 	 {2025},
  editor = 	 {Zeng, Nianyin and Pachori, Ram Bilas and Wang, Dongshu},
  volume = 	 {278},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {25--27 Apr},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v278/main/assets/liu25c/liu25c.pdf},
  url = 	 {https://proceedings.mlr.press/v278/liu25c.html},
  abstract = 	 {Large language models (LLMs) have advanced significantly in natural language processing, and vision language models (VLMs) have extended this progress to tasks like image captioning and visual question answering (VQA). Despite this success, VLMs often generate hallucinated or factually inconsistent contents. Traditional methods focus on improving model reasoning by modifying the inference procedure, but we propose a new approach: PatchPrune, which dynamically prunes redundant or uninformative image patches, using a composite importance score based on activation magnitude and feature entropy. As shown in Figure  By reducing input noise, PatchPrune enables the model to focus on relevant features, improving the accuracy and reliability of its outputs. Experimental results show that PatchPrune enhances multimodal reasoning and mitigates hallucinations effectively.}
}

Endnote

%0 Conference Paper
%T PatchPrune: Reducing Hallucinations in Vision Language Models by Pruning Redundant Image Patches
%A Changyan Liu
%B Proceedings of 2025 2nd International Conference on Machine Learning and Intelligent Computing
%C Proceedings of Machine Learning Research
%D 2025
%E Nianyin Zeng
%E Ram Bilas Pachori
%E Dongshu Wang	
%F pmlr-v278-liu25c
%I PMLR
%P 298--304
%U https://proceedings.mlr.press/v278/liu25c.html
%V 278
%X Large language models (LLMs) have advanced significantly in natural language processing, and vision language models (VLMs) have extended this progress to tasks like image captioning and visual question answering (VQA). Despite this success, VLMs often generate hallucinated or factually inconsistent contents. Traditional methods focus on improving model reasoning by modifying the inference procedure, but we propose a new approach: PatchPrune, which dynamically prunes redundant or uninformative image patches, using a composite importance score based on activation magnitude and feature entropy. As shown in Figure  By reducing input noise, PatchPrune enables the model to focus on relevant features, improving the accuracy and reliability of its outputs. Experimental results show that PatchPrune enhances multimodal reasoning and mitigates hallucinations effectively.

APA

Liu, C.. (2025). PatchPrune: Reducing Hallucinations in Vision Language Models by Pruning Redundant Image Patches. Proceedings of 2025 2nd International Conference on Machine Learning and Intelligent Computing, in Proceedings of Machine Learning Research 278:298-304 Available from https://proceedings.mlr.press/v278/liu25c.html.

Related Material

Download PDF