CRAFT: A Neuro-Symbolic Framework for Visual Functional Affordance Grounding

Zhou Chen, Joe Lin, Sathyanarayanan N. Aakur
Proceedings of The 19th International Conference on Neurosymbolic Learning and Reasoning, PMLR 284:343-352, 2025.

Abstract

We introduce CRAFT, a neuro-symbolic framework for interpretable affordance grounding, which identifies the objects in a scene that enable a given action (e.g., “cut”). CRAFT integrates structured commonsense priors from ConceptNet and language models with visual evidence from CLIP, using an energy-based reasoning loop to refine predictions iteratively. This process yields transparent, goal-driven decisions to ground symbolic and perceptual structures. Experiments in multi-object, label-free settings demonstrate that CRAFT enhances accuracy while improving interpretability, providing a step toward robust and trustworthy scene understanding.

Cite this Paper


BibTeX
@InProceedings{pmlr-v284-chen25a, title = {CRAFT: A Neuro-Symbolic Framework for Visual Functional Affordance Grounding}, author = {Chen, Zhou and Lin, Joe and Aakur, Sathyanarayanan N.}, booktitle = {Proceedings of The 19th International Conference on Neurosymbolic Learning and Reasoning}, pages = {343--352}, year = {2025}, editor = {H. Gilpin, Leilani and Giunchiglia, Eleonora and Hitzler, Pascal and van Krieken, Emile}, volume = {284}, series = {Proceedings of Machine Learning Research}, month = {08--10 Sep}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v284/main/assets/chen25a/chen25a.pdf}, url = {https://proceedings.mlr.press/v284/chen25a.html}, abstract = {We introduce CRAFT, a neuro-symbolic framework for interpretable affordance grounding, which identifies the objects in a scene that enable a given action (e.g., “cut”). CRAFT integrates structured commonsense priors from ConceptNet and language models with visual evidence from CLIP, using an energy-based reasoning loop to refine predictions iteratively. This process yields transparent, goal-driven decisions to ground symbolic and perceptual structures. Experiments in multi-object, label-free settings demonstrate that CRAFT enhances accuracy while improving interpretability, providing a step toward robust and trustworthy scene understanding.} }
Endnote
%0 Conference Paper %T CRAFT: A Neuro-Symbolic Framework for Visual Functional Affordance Grounding %A Zhou Chen %A Joe Lin %A Sathyanarayanan N. Aakur %B Proceedings of The 19th International Conference on Neurosymbolic Learning and Reasoning %C Proceedings of Machine Learning Research %D 2025 %E Leilani H. Gilpin %E Eleonora Giunchiglia %E Pascal Hitzler %E Emile van Krieken %F pmlr-v284-chen25a %I PMLR %P 343--352 %U https://proceedings.mlr.press/v284/chen25a.html %V 284 %X We introduce CRAFT, a neuro-symbolic framework for interpretable affordance grounding, which identifies the objects in a scene that enable a given action (e.g., “cut”). CRAFT integrates structured commonsense priors from ConceptNet and language models with visual evidence from CLIP, using an energy-based reasoning loop to refine predictions iteratively. This process yields transparent, goal-driven decisions to ground symbolic and perceptual structures. Experiments in multi-object, label-free settings demonstrate that CRAFT enhances accuracy while improving interpretability, providing a step toward robust and trustworthy scene understanding.
APA
Chen, Z., Lin, J. & Aakur, S.N.. (2025). CRAFT: A Neuro-Symbolic Framework for Visual Functional Affordance Grounding. Proceedings of The 19th International Conference on Neurosymbolic Learning and Reasoning, in Proceedings of Machine Learning Research 284:343-352 Available from https://proceedings.mlr.press/v284/chen25a.html.

Related Material