Framework for Evaluating Faithfulness of Local Explanations

Sanjoy Dasgupta, Nave Frost, Michal Moshkovitz
Proceedings of the 39th International Conference on Machine Learning, PMLR 162:4794-4815, 2022.

Abstract

We study the faithfulness of an explanation system to the underlying prediction model. We show that this can be captured by two properties, consistency and sufficiency, and introduce quantitative measures of the extent to which these hold. Interestingly, these measures depend on the test-time data distribution. For a variety of existing explanation systems, such as anchors, we analytically study these quantities. We also provide estimators and sample complexity bounds for empirically determining the faithfulness of black-box explanation systems. Finally, we experimentally validate the new properties and estimators.

Cite this Paper


BibTeX
@InProceedings{pmlr-v162-dasgupta22a, title = {Framework for Evaluating Faithfulness of Local Explanations}, author = {Dasgupta, Sanjoy and Frost, Nave and Moshkovitz, Michal}, booktitle = {Proceedings of the 39th International Conference on Machine Learning}, pages = {4794--4815}, year = {2022}, editor = {Chaudhuri, Kamalika and Jegelka, Stefanie and Song, Le and Szepesvari, Csaba and Niu, Gang and Sabato, Sivan}, volume = {162}, series = {Proceedings of Machine Learning Research}, month = {17--23 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v162/dasgupta22a/dasgupta22a.pdf}, url = {https://proceedings.mlr.press/v162/dasgupta22a.html}, abstract = {We study the faithfulness of an explanation system to the underlying prediction model. We show that this can be captured by two properties, consistency and sufficiency, and introduce quantitative measures of the extent to which these hold. Interestingly, these measures depend on the test-time data distribution. For a variety of existing explanation systems, such as anchors, we analytically study these quantities. We also provide estimators and sample complexity bounds for empirically determining the faithfulness of black-box explanation systems. Finally, we experimentally validate the new properties and estimators.} }
Endnote
%0 Conference Paper %T Framework for Evaluating Faithfulness of Local Explanations %A Sanjoy Dasgupta %A Nave Frost %A Michal Moshkovitz %B Proceedings of the 39th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2022 %E Kamalika Chaudhuri %E Stefanie Jegelka %E Le Song %E Csaba Szepesvari %E Gang Niu %E Sivan Sabato %F pmlr-v162-dasgupta22a %I PMLR %P 4794--4815 %U https://proceedings.mlr.press/v162/dasgupta22a.html %V 162 %X We study the faithfulness of an explanation system to the underlying prediction model. We show that this can be captured by two properties, consistency and sufficiency, and introduce quantitative measures of the extent to which these hold. Interestingly, these measures depend on the test-time data distribution. For a variety of existing explanation systems, such as anchors, we analytically study these quantities. We also provide estimators and sample complexity bounds for empirically determining the faithfulness of black-box explanation systems. Finally, we experimentally validate the new properties and estimators.
APA
Dasgupta, S., Frost, N. & Moshkovitz, M.. (2022). Framework for Evaluating Faithfulness of Local Explanations. Proceedings of the 39th International Conference on Machine Learning, in Proceedings of Machine Learning Research 162:4794-4815 Available from https://proceedings.mlr.press/v162/dasgupta22a.html.

Related Material