Fairwashing explanations with off-manifold detergent

Christopher Anders, Plamen Pasliev, Ann-Kathrin Dombrowski, Klaus-Robert Müller, Pan Kessel
Proceedings of the 37th International Conference on Machine Learning, PMLR 119:314-323, 2020.

Abstract

Explanation methods promise to make black-box classifiers more transparent. As a result, it is hoped that they can act as proof for a sensible, fair and trustworthy decision-making process of the algorithm and thereby increase its acceptance by the end-users. In this paper, we show both theoretically and experimentally that these hopes are presently unfounded. Specifically, we show that, for any classifier $g$, one can always construct another classifier $\tilde{g}$ which has the same behavior on the data (same train, validation, and test error) but has arbitrarily manipulated explanation maps. We derive this statement theoretically using differential geometry and demonstrate it experimentally for various explanation methods, architectures, and datasets. Motivated by our theoretical insights, we then propose a modification of existing explanation methods which makes them significantly more robust.

Cite this Paper


BibTeX
@InProceedings{pmlr-v119-anders20a, title = {Fairwashing explanations with off-manifold detergent}, author = {Anders, Christopher and Pasliev, Plamen and Dombrowski, Ann-Kathrin and M{\"u}ller, Klaus-Robert and Kessel, Pan}, booktitle = {Proceedings of the 37th International Conference on Machine Learning}, pages = {314--323}, year = {2020}, editor = {Hal Daumé III and Aarti Singh}, volume = {119}, series = {Proceedings of Machine Learning Research}, month = {13--18 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v119/anders20a/anders20a.pdf}, url = { http://proceedings.mlr.press/v119/anders20a.html }, abstract = {Explanation methods promise to make black-box classifiers more transparent. As a result, it is hoped that they can act as proof for a sensible, fair and trustworthy decision-making process of the algorithm and thereby increase its acceptance by the end-users. In this paper, we show both theoretically and experimentally that these hopes are presently unfounded. Specifically, we show that, for any classifier $g$, one can always construct another classifier $\tilde{g}$ which has the same behavior on the data (same train, validation, and test error) but has arbitrarily manipulated explanation maps. We derive this statement theoretically using differential geometry and demonstrate it experimentally for various explanation methods, architectures, and datasets. Motivated by our theoretical insights, we then propose a modification of existing explanation methods which makes them significantly more robust.} }
Endnote
%0 Conference Paper %T Fairwashing explanations with off-manifold detergent %A Christopher Anders %A Plamen Pasliev %A Ann-Kathrin Dombrowski %A Klaus-Robert Müller %A Pan Kessel %B Proceedings of the 37th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2020 %E Hal Daumé III %E Aarti Singh %F pmlr-v119-anders20a %I PMLR %P 314--323 %U http://proceedings.mlr.press/v119/anders20a.html %V 119 %X Explanation methods promise to make black-box classifiers more transparent. As a result, it is hoped that they can act as proof for a sensible, fair and trustworthy decision-making process of the algorithm and thereby increase its acceptance by the end-users. In this paper, we show both theoretically and experimentally that these hopes are presently unfounded. Specifically, we show that, for any classifier $g$, one can always construct another classifier $\tilde{g}$ which has the same behavior on the data (same train, validation, and test error) but has arbitrarily manipulated explanation maps. We derive this statement theoretically using differential geometry and demonstrate it experimentally for various explanation methods, architectures, and datasets. Motivated by our theoretical insights, we then propose a modification of existing explanation methods which makes them significantly more robust.
APA
Anders, C., Pasliev, P., Dombrowski, A., Müller, K. & Kessel, P.. (2020). Fairwashing explanations with off-manifold detergent. Proceedings of the 37th International Conference on Machine Learning, in Proceedings of Machine Learning Research 119:314-323 Available from http://proceedings.mlr.press/v119/anders20a.html .

Related Material