Position: Principles of Animal Cognition to Improve LLM Evaluations

Sunayana Rane, Cyrus F. Kirkman, Graham Todd, Amanda Royka, Ryan M.C. Law, Erica Cartmill, Jacob Gates Foster
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:82051-82061, 2025.

Abstract

It has become increasingly challenging to understand and evaluate LLM capabilities as these models exhibit a broader range of behaviors. In this position paper, we argue that LLM researchers should draw on the lessons from another field which has developed a rich set of experimental paradigms and design practices for probing the behavior of complex intelligent systems: animal cognition. We present five core principles of evaluation drawn from animal cognition research, and explain how they provide invaluable guidance for understanding LLM capabilities and behavior. We ground these principles in an empirical case study, and show how they can already provide a richer picture of one particular reasoning capability: transitive inference.

Cite this Paper


BibTeX
@InProceedings{pmlr-v267-rane25a, title = {Position: Principles of Animal Cognition to Improve {LLM} Evaluations}, author = {Rane, Sunayana and Kirkman, Cyrus F. and Todd, Graham and Royka, Amanda and Law, Ryan M.C. and Cartmill, Erica and Foster, Jacob Gates}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, pages = {82051--82061}, year = {2025}, editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry}, volume = {267}, series = {Proceedings of Machine Learning Research}, month = {13--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/rane25a/rane25a.pdf}, url = {https://proceedings.mlr.press/v267/rane25a.html}, abstract = {It has become increasingly challenging to understand and evaluate LLM capabilities as these models exhibit a broader range of behaviors. In this position paper, we argue that LLM researchers should draw on the lessons from another field which has developed a rich set of experimental paradigms and design practices for probing the behavior of complex intelligent systems: animal cognition. We present five core principles of evaluation drawn from animal cognition research, and explain how they provide invaluable guidance for understanding LLM capabilities and behavior. We ground these principles in an empirical case study, and show how they can already provide a richer picture of one particular reasoning capability: transitive inference.} }
Endnote
%0 Conference Paper %T Position: Principles of Animal Cognition to Improve LLM Evaluations %A Sunayana Rane %A Cyrus F. Kirkman %A Graham Todd %A Amanda Royka %A Ryan M.C. Law %A Erica Cartmill %A Jacob Gates Foster %B Proceedings of the 42nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Aarti Singh %E Maryam Fazel %E Daniel Hsu %E Simon Lacoste-Julien %E Felix Berkenkamp %E Tegan Maharaj %E Kiri Wagstaff %E Jerry Zhu %F pmlr-v267-rane25a %I PMLR %P 82051--82061 %U https://proceedings.mlr.press/v267/rane25a.html %V 267 %X It has become increasingly challenging to understand and evaluate LLM capabilities as these models exhibit a broader range of behaviors. In this position paper, we argue that LLM researchers should draw on the lessons from another field which has developed a rich set of experimental paradigms and design practices for probing the behavior of complex intelligent systems: animal cognition. We present five core principles of evaluation drawn from animal cognition research, and explain how they provide invaluable guidance for understanding LLM capabilities and behavior. We ground these principles in an empirical case study, and show how they can already provide a richer picture of one particular reasoning capability: transitive inference.
APA
Rane, S., Kirkman, C.F., Todd, G., Royka, A., Law, R.M., Cartmill, E. & Foster, J.G.. (2025). Position: Principles of Animal Cognition to Improve LLM Evaluations. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:82051-82061 Available from https://proceedings.mlr.press/v267/rane25a.html.

Related Material