Can Visual Scratchpads With Diagrammatic Abstractions Augment LLM Reasoning?

Joy Hsu, Gabriel Poesia, Jiajun Wu, Noah Goodman
Proceedings on "I Can't Believe It's Not Better: Failure Modes in the Age of Foundation Models" at NeurIPS 2023 Workshops, PMLR 239:21-28, 2023.

Abstract

When humans reason about complex text-based questions, we leverage diagrammatic abstractions drawn on a visual scratchpad. In this paper, we introduce and explore the capabilities of Visual-Scratchpad, a method that augments a large language foundation model (LLM) with diagrammatic execution and readout. We enable the LLM to generate drawing commands and to readout abstractions from the resulting picture. The visual readout operation uses a visual foundation model, optionally finetuned with expert iteration. Here, we show that although Visual-Scratchpad outperforms an inference-only LLM, it surprisingly yields worse performance compared to a single finetuned LLM. Through experiments, we propose that this gap is due to the failure mode of vision foundation models in understanding abstractions in diagrams.

Cite this Paper


BibTeX
@InProceedings{pmlr-v239-hsu23a, title = {Can Visual Scratchpads With Diagrammatic Abstractions Augment LLM Reasoning?}, author = {Hsu, Joy and Poesia, Gabriel and Wu, Jiajun and Goodman, Noah}, booktitle = {Proceedings on "I Can't Believe It's Not Better: Failure Modes in the Age of Foundation Models" at NeurIPS 2023 Workshops}, pages = {21--28}, year = {2023}, editor = {AntorĂ¡n, Javier and Blaas, Arno and Buchanan, Kelly and Feng, Fan and Fortuin, Vincent and Ghalebikesabi, Sahra and Kriegler, Andreas and Mason, Ian and Rohde, David and Ruiz, Francisco J. R. and Uelwer, Tobias and Xie, Yubin and Yang, Rui}, volume = {239}, series = {Proceedings of Machine Learning Research}, month = {16 Dec}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v239/hsu23a/hsu23a.pdf}, url = {https://proceedings.mlr.press/v239/hsu23a.html}, abstract = {When humans reason about complex text-based questions, we leverage diagrammatic abstractions drawn on a visual scratchpad. In this paper, we introduce and explore the capabilities of Visual-Scratchpad, a method that augments a large language foundation model (LLM) with diagrammatic execution and readout. We enable the LLM to generate drawing commands and to readout abstractions from the resulting picture. The visual readout operation uses a visual foundation model, optionally finetuned with expert iteration. Here, we show that although Visual-Scratchpad outperforms an inference-only LLM, it surprisingly yields worse performance compared to a single finetuned LLM. Through experiments, we propose that this gap is due to the failure mode of vision foundation models in understanding abstractions in diagrams.} }
Endnote
%0 Conference Paper %T Can Visual Scratchpads With Diagrammatic Abstractions Augment LLM Reasoning? %A Joy Hsu %A Gabriel Poesia %A Jiajun Wu %A Noah Goodman %B Proceedings on "I Can't Believe It's Not Better: Failure Modes in the Age of Foundation Models" at NeurIPS 2023 Workshops %C Proceedings of Machine Learning Research %D 2023 %E Javier AntorĂ¡n %E Arno Blaas %E Kelly Buchanan %E Fan Feng %E Vincent Fortuin %E Sahra Ghalebikesabi %E Andreas Kriegler %E Ian Mason %E David Rohde %E Francisco J. R. Ruiz %E Tobias Uelwer %E Yubin Xie %E Rui Yang %F pmlr-v239-hsu23a %I PMLR %P 21--28 %U https://proceedings.mlr.press/v239/hsu23a.html %V 239 %X When humans reason about complex text-based questions, we leverage diagrammatic abstractions drawn on a visual scratchpad. In this paper, we introduce and explore the capabilities of Visual-Scratchpad, a method that augments a large language foundation model (LLM) with diagrammatic execution and readout. We enable the LLM to generate drawing commands and to readout abstractions from the resulting picture. The visual readout operation uses a visual foundation model, optionally finetuned with expert iteration. Here, we show that although Visual-Scratchpad outperforms an inference-only LLM, it surprisingly yields worse performance compared to a single finetuned LLM. Through experiments, we propose that this gap is due to the failure mode of vision foundation models in understanding abstractions in diagrams.
APA
Hsu, J., Poesia, G., Wu, J. & Goodman, N.. (2023). Can Visual Scratchpads With Diagrammatic Abstractions Augment LLM Reasoning?. Proceedings on "I Can't Believe It's Not Better: Failure Modes in the Age of Foundation Models" at NeurIPS 2023 Workshops, in Proceedings of Machine Learning Research 239:21-28 Available from https://proceedings.mlr.press/v239/hsu23a.html.

Related Material