Gestalt Vision: A Dataset for Evaluating Gestalt Principles in Visual Perception

Jingyuan Sha, Hikaru Shindo, Kristian Kersting, Devendra Singh Dhami
Proceedings of The 19th International Conference on Neurosymbolic Learning and Reasoning, PMLR 284:873-890, 2025.

Abstract

Gestalt principles, established in the 1920s, describe how humans perceive individual elements as cohesive wholes. These principles, including proximity, similarity, closure, continuity, and symmetry, play a fundamental role in human perception, enabling structured visual interpretation. Despite their significance, existing AI benchmarks fail to assess models’ ability to infer patterns at the group level, where multiple objects following the same Gestalt principle are considered as a group using these principles. To address this gap, we introduce Gestalt Vision, a diagnostic framework designed to evaluate AI models’ ability to not only identify groups within patterns but also reason about the underlying logical rules governing these patterns. Gestalt Vision provides structured visual tasks and baseline evaluations spanning neural, symbolic, and neural-symbolic approaches, uncovering key limitations in current models’ ability to perform human-like visual cognition. Our findings emphasize the necessity of incorporating richer perceptual mechanisms into AI reasoning frameworks. By bridging the gap between human perception and computational models, Gestalt Vision offers a crucial step toward developing AI systems with improved perceptual organization and visual reasoning capabilities.

Cite this Paper


BibTeX
@InProceedings{pmlr-v284-sha25a, title = {Gestalt Vision: A Dataset for Evaluating Gestalt Principles in Visual Perception}, author = {Sha, Jingyuan and Shindo, Hikaru and Kersting, Kristian and Dhami, Devendra Singh}, booktitle = {Proceedings of The 19th International Conference on Neurosymbolic Learning and Reasoning}, pages = {873--890}, year = {2025}, editor = {H. Gilpin, Leilani and Giunchiglia, Eleonora and Hitzler, Pascal and van Krieken, Emile}, volume = {284}, series = {Proceedings of Machine Learning Research}, month = {08--10 Sep}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v284/main/assets/sha25a/sha25a.pdf}, url = {https://proceedings.mlr.press/v284/sha25a.html}, abstract = {Gestalt principles, established in the 1920s, describe how humans perceive individual elements as cohesive wholes. These principles, including proximity, similarity, closure, continuity, and symmetry, play a fundamental role in human perception, enabling structured visual interpretation. Despite their significance, existing AI benchmarks fail to assess models’ ability to infer patterns at the group level, where multiple objects following the same Gestalt principle are considered as a group using these principles. To address this gap, we introduce Gestalt Vision, a diagnostic framework designed to evaluate AI models’ ability to not only identify groups within patterns but also reason about the underlying logical rules governing these patterns. Gestalt Vision provides structured visual tasks and baseline evaluations spanning neural, symbolic, and neural-symbolic approaches, uncovering key limitations in current models’ ability to perform human-like visual cognition. Our findings emphasize the necessity of incorporating richer perceptual mechanisms into AI reasoning frameworks. By bridging the gap between human perception and computational models, Gestalt Vision offers a crucial step toward developing AI systems with improved perceptual organization and visual reasoning capabilities.} }
Endnote
%0 Conference Paper %T Gestalt Vision: A Dataset for Evaluating Gestalt Principles in Visual Perception %A Jingyuan Sha %A Hikaru Shindo %A Kristian Kersting %A Devendra Singh Dhami %B Proceedings of The 19th International Conference on Neurosymbolic Learning and Reasoning %C Proceedings of Machine Learning Research %D 2025 %E Leilani H. Gilpin %E Eleonora Giunchiglia %E Pascal Hitzler %E Emile van Krieken %F pmlr-v284-sha25a %I PMLR %P 873--890 %U https://proceedings.mlr.press/v284/sha25a.html %V 284 %X Gestalt principles, established in the 1920s, describe how humans perceive individual elements as cohesive wholes. These principles, including proximity, similarity, closure, continuity, and symmetry, play a fundamental role in human perception, enabling structured visual interpretation. Despite their significance, existing AI benchmarks fail to assess models’ ability to infer patterns at the group level, where multiple objects following the same Gestalt principle are considered as a group using these principles. To address this gap, we introduce Gestalt Vision, a diagnostic framework designed to evaluate AI models’ ability to not only identify groups within patterns but also reason about the underlying logical rules governing these patterns. Gestalt Vision provides structured visual tasks and baseline evaluations spanning neural, symbolic, and neural-symbolic approaches, uncovering key limitations in current models’ ability to perform human-like visual cognition. Our findings emphasize the necessity of incorporating richer perceptual mechanisms into AI reasoning frameworks. By bridging the gap between human perception and computational models, Gestalt Vision offers a crucial step toward developing AI systems with improved perceptual organization and visual reasoning capabilities.
APA
Sha, J., Shindo, H., Kersting, K. & Dhami, D.S.. (2025). Gestalt Vision: A Dataset for Evaluating Gestalt Principles in Visual Perception. Proceedings of The 19th International Conference on Neurosymbolic Learning and Reasoning, in Proceedings of Machine Learning Research 284:873-890 Available from https://proceedings.mlr.press/v284/sha25a.html.

Related Material