RULEBREAKERS: Challenging LLMs at the Crossroads between Formal Logic and Human-like Reasoning

Jason Chan, Robert J. Gaizauskas, Zhixue Zhao
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:7276-7305, 2025.

Abstract

Formal logic enables computers to reason in natural language by representing sentences in symbolic forms and applying rules to derive conclusions. However, in what our study characterizes as "rulebreaker" scenarios, this method can lead to conclusions that are typically not inferred or accepted by humans given their common sense and factual knowledge. Inspired by works in cognitive science, we create RULEBREAKERS, the first dataset for rigorously evaluating the ability of large language models (LLMs) to recognize and respond to rulebreakers (versus non-rulebreakers) in a knowledge-informed and human-like manner. Evaluating seven LLMs, we find that most models achieve mediocre accuracy on RULEBREAKERS and exhibit some tendency to over-rigidly apply logical rules, unlike what is expected from typical human reasoners. Further analysis suggests that this apparent failure is potentially associated with the models’ poor utilization of their world knowledge and their attention distribution patterns. Whilst revealing a limitation of current LLMs, our study also provides a timely counterbalance to a growing body of recent works that propose methods relying on formal logic to improve LLMs’ general reasoning capabilities, highlighting their risk of further increasing divergence between LLMs and human-like reasoning.

Cite this Paper


BibTeX
@InProceedings{pmlr-v267-chan25a, title = {{RULEBREAKERS}: Challenging {LLM}s at the Crossroads between Formal Logic and Human-like Reasoning}, author = {Chan, Jason and Gaizauskas, Robert J. and Zhao, Zhixue}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, pages = {7276--7305}, year = {2025}, editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry}, volume = {267}, series = {Proceedings of Machine Learning Research}, month = {13--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/chan25a/chan25a.pdf}, url = {https://proceedings.mlr.press/v267/chan25a.html}, abstract = {Formal logic enables computers to reason in natural language by representing sentences in symbolic forms and applying rules to derive conclusions. However, in what our study characterizes as "rulebreaker" scenarios, this method can lead to conclusions that are typically not inferred or accepted by humans given their common sense and factual knowledge. Inspired by works in cognitive science, we create RULEBREAKERS, the first dataset for rigorously evaluating the ability of large language models (LLMs) to recognize and respond to rulebreakers (versus non-rulebreakers) in a knowledge-informed and human-like manner. Evaluating seven LLMs, we find that most models achieve mediocre accuracy on RULEBREAKERS and exhibit some tendency to over-rigidly apply logical rules, unlike what is expected from typical human reasoners. Further analysis suggests that this apparent failure is potentially associated with the models’ poor utilization of their world knowledge and their attention distribution patterns. Whilst revealing a limitation of current LLMs, our study also provides a timely counterbalance to a growing body of recent works that propose methods relying on formal logic to improve LLMs’ general reasoning capabilities, highlighting their risk of further increasing divergence between LLMs and human-like reasoning.} }
Endnote
%0 Conference Paper %T RULEBREAKERS: Challenging LLMs at the Crossroads between Formal Logic and Human-like Reasoning %A Jason Chan %A Robert J. Gaizauskas %A Zhixue Zhao %B Proceedings of the 42nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Aarti Singh %E Maryam Fazel %E Daniel Hsu %E Simon Lacoste-Julien %E Felix Berkenkamp %E Tegan Maharaj %E Kiri Wagstaff %E Jerry Zhu %F pmlr-v267-chan25a %I PMLR %P 7276--7305 %U https://proceedings.mlr.press/v267/chan25a.html %V 267 %X Formal logic enables computers to reason in natural language by representing sentences in symbolic forms and applying rules to derive conclusions. However, in what our study characterizes as "rulebreaker" scenarios, this method can lead to conclusions that are typically not inferred or accepted by humans given their common sense and factual knowledge. Inspired by works in cognitive science, we create RULEBREAKERS, the first dataset for rigorously evaluating the ability of large language models (LLMs) to recognize and respond to rulebreakers (versus non-rulebreakers) in a knowledge-informed and human-like manner. Evaluating seven LLMs, we find that most models achieve mediocre accuracy on RULEBREAKERS and exhibit some tendency to over-rigidly apply logical rules, unlike what is expected from typical human reasoners. Further analysis suggests that this apparent failure is potentially associated with the models’ poor utilization of their world knowledge and their attention distribution patterns. Whilst revealing a limitation of current LLMs, our study also provides a timely counterbalance to a growing body of recent works that propose methods relying on formal logic to improve LLMs’ general reasoning capabilities, highlighting their risk of further increasing divergence between LLMs and human-like reasoning.
APA
Chan, J., Gaizauskas, R.J. & Zhao, Z.. (2025). RULEBREAKERS: Challenging LLMs at the Crossroads between Formal Logic and Human-like Reasoning. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:7276-7305 Available from https://proceedings.mlr.press/v267/chan25a.html.

Related Material