A Psychological Theory of Explainability

Scott Cheng-Hsin Yang, Nils Erik Tomas Folke, Patrick Shafto
Proceedings of the 39th International Conference on Machine Learning, PMLR 162:25007-25021, 2022.

Abstract

The goal of explainable Artificial Intelligence (XAI) is to generate human-interpretable explanations, but there are no computationally precise theories of how humans interpret AI generated explanations. The lack of theory means that validation of XAI must be done empirically, on a case-by-case basis, which prevents systematic theory-building in XAI. We propose a psychological theory of how humans draw conclusions from saliency maps, the most common form of XAI explanation, which for the first time allows for precise prediction of explainee inference conditioned on explanation. Our theory posits that absent explanation humans expect the AI to make similar decisions to themselves, and that they interpret an explanation by comparison to the explanations they themselves would give. Comparison is formalized via Shepard’s universal law of generalization in a similarity space, a classic theory from cognitive science. A pre-registered user study on AI image classifications with saliency map explanations demonstrate that our theory quantitatively matches participants’ predictions of the AI.

Cite this Paper


BibTeX
@InProceedings{pmlr-v162-yang22c, title = {A Psychological Theory of Explainability}, author = {Yang, Scott Cheng-Hsin and Folke, Nils Erik Tomas and Shafto, Patrick}, booktitle = {Proceedings of the 39th International Conference on Machine Learning}, pages = {25007--25021}, year = {2022}, editor = {Chaudhuri, Kamalika and Jegelka, Stefanie and Song, Le and Szepesvari, Csaba and Niu, Gang and Sabato, Sivan}, volume = {162}, series = {Proceedings of Machine Learning Research}, month = {17--23 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v162/yang22c/yang22c.pdf}, url = {https://proceedings.mlr.press/v162/yang22c.html}, abstract = {The goal of explainable Artificial Intelligence (XAI) is to generate human-interpretable explanations, but there are no computationally precise theories of how humans interpret AI generated explanations. The lack of theory means that validation of XAI must be done empirically, on a case-by-case basis, which prevents systematic theory-building in XAI. We propose a psychological theory of how humans draw conclusions from saliency maps, the most common form of XAI explanation, which for the first time allows for precise prediction of explainee inference conditioned on explanation. Our theory posits that absent explanation humans expect the AI to make similar decisions to themselves, and that they interpret an explanation by comparison to the explanations they themselves would give. Comparison is formalized via Shepard’s universal law of generalization in a similarity space, a classic theory from cognitive science. A pre-registered user study on AI image classifications with saliency map explanations demonstrate that our theory quantitatively matches participants’ predictions of the AI.} }
Endnote
%0 Conference Paper %T A Psychological Theory of Explainability %A Scott Cheng-Hsin Yang %A Nils Erik Tomas Folke %A Patrick Shafto %B Proceedings of the 39th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2022 %E Kamalika Chaudhuri %E Stefanie Jegelka %E Le Song %E Csaba Szepesvari %E Gang Niu %E Sivan Sabato %F pmlr-v162-yang22c %I PMLR %P 25007--25021 %U https://proceedings.mlr.press/v162/yang22c.html %V 162 %X The goal of explainable Artificial Intelligence (XAI) is to generate human-interpretable explanations, but there are no computationally precise theories of how humans interpret AI generated explanations. The lack of theory means that validation of XAI must be done empirically, on a case-by-case basis, which prevents systematic theory-building in XAI. We propose a psychological theory of how humans draw conclusions from saliency maps, the most common form of XAI explanation, which for the first time allows for precise prediction of explainee inference conditioned on explanation. Our theory posits that absent explanation humans expect the AI to make similar decisions to themselves, and that they interpret an explanation by comparison to the explanations they themselves would give. Comparison is formalized via Shepard’s universal law of generalization in a similarity space, a classic theory from cognitive science. A pre-registered user study on AI image classifications with saliency map explanations demonstrate that our theory quantitatively matches participants’ predictions of the AI.
APA
Yang, S.C., Folke, N.E.T. & Shafto, P.. (2022). A Psychological Theory of Explainability. Proceedings of the 39th International Conference on Machine Learning, in Proceedings of Machine Learning Research 162:25007-25021 Available from https://proceedings.mlr.press/v162/yang22c.html.

Related Material