Have We Learned to Explain?: How Interpretability Methods Can Learn to Encode Predictions in their Interpretations.

Neil Jethani, Mukund Sudarshan, Yindalon Aphinyanaphongs, Rajesh Ranganath
Proceedings of The 24th International Conference on Artificial Intelligence and Statistics, PMLR 130:1459-1467, 2021.

Abstract

While the need for interpretable machine learning has been established, many common approaches are slow, lack fidelity, or hard to evaluate. Amortized explanation methods reduce the cost of providing interpretations by learning a global selector model that returns feature importances for a single instance of data. The selector model is trained to optimize the fidelity of the interpretations, as evaluated by a predictor model for the target. Popular methods learn the selector and predictor model in concert, which we show allows predictions to be encoded within interpretations. We introduce EVAL-X as a method to quantitatively evaluate interpretations and REAL-X as an amortized explanation method, which learn a predictor model that approximates the true data generating distribution given any subset of the input. We show EVAL-X can detect when predictions are encoded in interpretations and show the advantages of REAL-X through quantitative and radiologist evaluation.

Cite this Paper


BibTeX
@InProceedings{pmlr-v130-jethani21a, title = { Have We Learned to Explain?: How Interpretability Methods Can Learn to Encode Predictions in their Interpretations. }, author = {Jethani, Neil and Sudarshan, Mukund and Aphinyanaphongs, Yindalon and Ranganath, Rajesh}, booktitle = {Proceedings of The 24th International Conference on Artificial Intelligence and Statistics}, pages = {1459--1467}, year = {2021}, editor = {Banerjee, Arindam and Fukumizu, Kenji}, volume = {130}, series = {Proceedings of Machine Learning Research}, month = {13--15 Apr}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v130/jethani21a/jethani21a.pdf}, url = {https://proceedings.mlr.press/v130/jethani21a.html}, abstract = { While the need for interpretable machine learning has been established, many common approaches are slow, lack fidelity, or hard to evaluate. Amortized explanation methods reduce the cost of providing interpretations by learning a global selector model that returns feature importances for a single instance of data. The selector model is trained to optimize the fidelity of the interpretations, as evaluated by a predictor model for the target. Popular methods learn the selector and predictor model in concert, which we show allows predictions to be encoded within interpretations. We introduce EVAL-X as a method to quantitatively evaluate interpretations and REAL-X as an amortized explanation method, which learn a predictor model that approximates the true data generating distribution given any subset of the input. We show EVAL-X can detect when predictions are encoded in interpretations and show the advantages of REAL-X through quantitative and radiologist evaluation. } }
Endnote
%0 Conference Paper %T Have We Learned to Explain?: How Interpretability Methods Can Learn to Encode Predictions in their Interpretations. %A Neil Jethani %A Mukund Sudarshan %A Yindalon Aphinyanaphongs %A Rajesh Ranganath %B Proceedings of The 24th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2021 %E Arindam Banerjee %E Kenji Fukumizu %F pmlr-v130-jethani21a %I PMLR %P 1459--1467 %U https://proceedings.mlr.press/v130/jethani21a.html %V 130 %X While the need for interpretable machine learning has been established, many common approaches are slow, lack fidelity, or hard to evaluate. Amortized explanation methods reduce the cost of providing interpretations by learning a global selector model that returns feature importances for a single instance of data. The selector model is trained to optimize the fidelity of the interpretations, as evaluated by a predictor model for the target. Popular methods learn the selector and predictor model in concert, which we show allows predictions to be encoded within interpretations. We introduce EVAL-X as a method to quantitatively evaluate interpretations and REAL-X as an amortized explanation method, which learn a predictor model that approximates the true data generating distribution given any subset of the input. We show EVAL-X can detect when predictions are encoded in interpretations and show the advantages of REAL-X through quantitative and radiologist evaluation.
APA
Jethani, N., Sudarshan, M., Aphinyanaphongs, Y. & Ranganath, R.. (2021). Have We Learned to Explain?: How Interpretability Methods Can Learn to Encode Predictions in their Interpretations. . Proceedings of The 24th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 130:1459-1467 Available from https://proceedings.mlr.press/v130/jethani21a.html.

Related Material