Bongard in Wonderland: Visual Puzzles that Still Make AI Go Mad?

Antonia Wüst, Tim Tobiasch, Lukas Helff, Inga Ibs, Wolfgang Stammer, Devendra Singh Dhami, Constantin A. Rothkopf, Kristian Kersting
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:68118-68142, 2025.

Abstract

Recently, newly developed Vision-Language Models (VLMs), such as OpenAI’s o1, have emerged, seemingly demonstrating advanced reasoning capabilities across text and image modalities. However, the depth of these advances in language-guided perception and abstract reasoning remains underexplored, and it is unclear whether these models can truly live up to their ambitious promises. To assess the progress and identify shortcomings, we enter the wonderland of Bongard problems, a set of classic visual reasoning puzzles that require human-like abilities of pattern recognition and abstract reasoning. With our extensive evaluation setup, we show that while VLMs occasionally succeed in identifying discriminative concepts and solving some of the problems, they frequently falter. Surprisingly, even elementary concepts that may seem trivial to humans, such as simple spirals, pose significant challenges. Moreover, when explicitly asked to recognize ground truth concepts, they continue to falter, suggesting not only a lack of understanding of these elementary visual concepts but also an inability to generalize to unseen concepts. We compare the results of VLMs to human performance and observe that a significant gap remains between human visual reasoning capabilities and machine cognition.

Cite this Paper


BibTeX
@InProceedings{pmlr-v267-wust25a, title = {Bongard in Wonderland: Visual Puzzles that Still Make {AI} Go Mad?}, author = {W\"{u}st, Antonia and Tobiasch, Tim and Helff, Lukas and Ibs, Inga and Stammer, Wolfgang and Dhami, Devendra Singh and Rothkopf, Constantin A. and Kersting, Kristian}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, pages = {68118--68142}, year = {2025}, editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry}, volume = {267}, series = {Proceedings of Machine Learning Research}, month = {13--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/wust25a/wust25a.pdf}, url = {https://proceedings.mlr.press/v267/wust25a.html}, abstract = {Recently, newly developed Vision-Language Models (VLMs), such as OpenAI’s o1, have emerged, seemingly demonstrating advanced reasoning capabilities across text and image modalities. However, the depth of these advances in language-guided perception and abstract reasoning remains underexplored, and it is unclear whether these models can truly live up to their ambitious promises. To assess the progress and identify shortcomings, we enter the wonderland of Bongard problems, a set of classic visual reasoning puzzles that require human-like abilities of pattern recognition and abstract reasoning. With our extensive evaluation setup, we show that while VLMs occasionally succeed in identifying discriminative concepts and solving some of the problems, they frequently falter. Surprisingly, even elementary concepts that may seem trivial to humans, such as simple spirals, pose significant challenges. Moreover, when explicitly asked to recognize ground truth concepts, they continue to falter, suggesting not only a lack of understanding of these elementary visual concepts but also an inability to generalize to unseen concepts. We compare the results of VLMs to human performance and observe that a significant gap remains between human visual reasoning capabilities and machine cognition.} }
Endnote
%0 Conference Paper %T Bongard in Wonderland: Visual Puzzles that Still Make AI Go Mad? %A Antonia Wüst %A Tim Tobiasch %A Lukas Helff %A Inga Ibs %A Wolfgang Stammer %A Devendra Singh Dhami %A Constantin A. Rothkopf %A Kristian Kersting %B Proceedings of the 42nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Aarti Singh %E Maryam Fazel %E Daniel Hsu %E Simon Lacoste-Julien %E Felix Berkenkamp %E Tegan Maharaj %E Kiri Wagstaff %E Jerry Zhu %F pmlr-v267-wust25a %I PMLR %P 68118--68142 %U https://proceedings.mlr.press/v267/wust25a.html %V 267 %X Recently, newly developed Vision-Language Models (VLMs), such as OpenAI’s o1, have emerged, seemingly demonstrating advanced reasoning capabilities across text and image modalities. However, the depth of these advances in language-guided perception and abstract reasoning remains underexplored, and it is unclear whether these models can truly live up to their ambitious promises. To assess the progress and identify shortcomings, we enter the wonderland of Bongard problems, a set of classic visual reasoning puzzles that require human-like abilities of pattern recognition and abstract reasoning. With our extensive evaluation setup, we show that while VLMs occasionally succeed in identifying discriminative concepts and solving some of the problems, they frequently falter. Surprisingly, even elementary concepts that may seem trivial to humans, such as simple spirals, pose significant challenges. Moreover, when explicitly asked to recognize ground truth concepts, they continue to falter, suggesting not only a lack of understanding of these elementary visual concepts but also an inability to generalize to unseen concepts. We compare the results of VLMs to human performance and observe that a significant gap remains between human visual reasoning capabilities and machine cognition.
APA
Wüst, A., Tobiasch, T., Helff, L., Ibs, I., Stammer, W., Dhami, D.S., Rothkopf, C.A. & Kersting, K.. (2025). Bongard in Wonderland: Visual Puzzles that Still Make AI Go Mad?. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:68118-68142 Available from https://proceedings.mlr.press/v267/wust25a.html.

Related Material