Explainable Zero-Shot Visual Question Answering via Logic-Based Reasoning

Thomas Eiter, Jan Hadl, Nelson Higuera Ruiz, Lukas Lange, Johannes Oetsch, Bileam Scheuvens, Jannik Strötgen
Proceedings of The 19th International Conference on Neurosymbolic Learning and Reasoning, PMLR 284:977-991, 2025.

Abstract

Visual Question Answering (VQA) is the task of answering natural language questions about images, which is a challenge for AI systems. To enhance adaptability and reduce training overhead, we address VQA in a zero-shot setting by leveraging pre-trained neural modules without additional fine-tuning. Our proposed hybrid neurosymbolic framework, whose capabilities are demonstrated on the challenging GQA dataset, integrates neural and symbolic components through logic-based reasoning via Answer-Set Programming. Specifically, our pipeline employs large language models for semantic parsing of input questions, followed by the generation of a scene graph that captures relevant visual content. Interpretable rules then operate on the symbolic representations of both the question and the scene graph to derive an answer. Our framework provides a key advantage: it enables full transparency into the reasoning process. Using an existing explanation tool, we illustrate how our method fosters trust by making decisions interpretable and facilitates error analysis when predictions are incorrect. Beyond explaining its own reasoning, our framework can also explain answers from more opaque models by integrating their answers into our system, enabling broader interpretability in VQA.

Cite this Paper


BibTeX
@InProceedings{pmlr-v284-eiter25a, title = {Explainable Zero-Shot Visual Question Answering via Logic-Based Reasoning}, author = {Eiter, Thomas and Hadl, Jan and Ruiz, Nelson Higuera and Lange, Lukas and Oetsch, Johannes and Scheuvens, Bileam and Str\"{o}tgen, Jannik}, booktitle = {Proceedings of The 19th International Conference on Neurosymbolic Learning and Reasoning}, pages = {977--991}, year = {2025}, editor = {H. Gilpin, Leilani and Giunchiglia, Eleonora and Hitzler, Pascal and van Krieken, Emile}, volume = {284}, series = {Proceedings of Machine Learning Research}, month = {08--10 Sep}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v284/main/assets/eiter25a/eiter25a.pdf}, url = {https://proceedings.mlr.press/v284/eiter25a.html}, abstract = {Visual Question Answering (VQA) is the task of answering natural language questions about images, which is a challenge for AI systems. To enhance adaptability and reduce training overhead, we address VQA in a zero-shot setting by leveraging pre-trained neural modules without additional fine-tuning. Our proposed hybrid neurosymbolic framework, whose capabilities are demonstrated on the challenging GQA dataset, integrates neural and symbolic components through logic-based reasoning via Answer-Set Programming. Specifically, our pipeline employs large language models for semantic parsing of input questions, followed by the generation of a scene graph that captures relevant visual content. Interpretable rules then operate on the symbolic representations of both the question and the scene graph to derive an answer. Our framework provides a key advantage: it enables full transparency into the reasoning process. Using an existing explanation tool, we illustrate how our method fosters trust by making decisions interpretable and facilitates error analysis when predictions are incorrect. Beyond explaining its own reasoning, our framework can also explain answers from more opaque models by integrating their answers into our system, enabling broader interpretability in VQA.} }
Endnote
%0 Conference Paper %T Explainable Zero-Shot Visual Question Answering via Logic-Based Reasoning %A Thomas Eiter %A Jan Hadl %A Nelson Higuera Ruiz %A Lukas Lange %A Johannes Oetsch %A Bileam Scheuvens %A Jannik Strötgen %B Proceedings of The 19th International Conference on Neurosymbolic Learning and Reasoning %C Proceedings of Machine Learning Research %D 2025 %E Leilani H. Gilpin %E Eleonora Giunchiglia %E Pascal Hitzler %E Emile van Krieken %F pmlr-v284-eiter25a %I PMLR %P 977--991 %U https://proceedings.mlr.press/v284/eiter25a.html %V 284 %X Visual Question Answering (VQA) is the task of answering natural language questions about images, which is a challenge for AI systems. To enhance adaptability and reduce training overhead, we address VQA in a zero-shot setting by leveraging pre-trained neural modules without additional fine-tuning. Our proposed hybrid neurosymbolic framework, whose capabilities are demonstrated on the challenging GQA dataset, integrates neural and symbolic components through logic-based reasoning via Answer-Set Programming. Specifically, our pipeline employs large language models for semantic parsing of input questions, followed by the generation of a scene graph that captures relevant visual content. Interpretable rules then operate on the symbolic representations of both the question and the scene graph to derive an answer. Our framework provides a key advantage: it enables full transparency into the reasoning process. Using an existing explanation tool, we illustrate how our method fosters trust by making decisions interpretable and facilitates error analysis when predictions are incorrect. Beyond explaining its own reasoning, our framework can also explain answers from more opaque models by integrating their answers into our system, enabling broader interpretability in VQA.
APA
Eiter, T., Hadl, J., Ruiz, N.H., Lange, L., Oetsch, J., Scheuvens, B. & Strötgen, J.. (2025). Explainable Zero-Shot Visual Question Answering via Logic-Based Reasoning. Proceedings of The 19th International Conference on Neurosymbolic Learning and Reasoning, in Proceedings of Machine Learning Research 284:977-991 Available from https://proceedings.mlr.press/v284/eiter25a.html.

Related Material