On Gradient-like Explanation under a Black-box Setting: When Black-box Explanations Become as Good as White-box

Yi Cai, Gerhard Wunder
Proceedings of the 41st International Conference on Machine Learning, PMLR 235:5360-5382, 2024.

Abstract

Attribution methods shed light on the explainability of data-driven approaches such as deep learning models by uncovering the most influential features in a to-be-explained decision. While determining feature attributions via gradients delivers promising results, the internal access required for acquiring gradients can be impractical under safety concerns, thus limiting the applicability of gradient-based approaches. In response to such limited flexibility, this paper presents GEEX (gradient-estimation-based explanation), a method that produces gradient-like explanations through only query-level access. The proposed approach holds a set of fundamental properties for attribution methods, which are mathematically rigorously proved, ensuring the quality of its explanations. In addition to the theoretical analysis, with a focus on image data, the experimental results empirically demonstrate the superiority of the proposed method over state-of-the-art black-box methods and its competitive performance compared to methods with full access.

Cite this Paper


BibTeX
@InProceedings{pmlr-v235-cai24h, title = {On Gradient-like Explanation under a Black-box Setting: When Black-box Explanations Become as Good as White-box}, author = {Cai, Yi and Wunder, Gerhard}, booktitle = {Proceedings of the 41st International Conference on Machine Learning}, pages = {5360--5382}, year = {2024}, editor = {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix}, volume = {235}, series = {Proceedings of Machine Learning Research}, month = {21--27 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v235/main/assets/cai24h/cai24h.pdf}, url = {https://proceedings.mlr.press/v235/cai24h.html}, abstract = {Attribution methods shed light on the explainability of data-driven approaches such as deep learning models by uncovering the most influential features in a to-be-explained decision. While determining feature attributions via gradients delivers promising results, the internal access required for acquiring gradients can be impractical under safety concerns, thus limiting the applicability of gradient-based approaches. In response to such limited flexibility, this paper presents GEEX (gradient-estimation-based explanation), a method that produces gradient-like explanations through only query-level access. The proposed approach holds a set of fundamental properties for attribution methods, which are mathematically rigorously proved, ensuring the quality of its explanations. In addition to the theoretical analysis, with a focus on image data, the experimental results empirically demonstrate the superiority of the proposed method over state-of-the-art black-box methods and its competitive performance compared to methods with full access.} }
Endnote
%0 Conference Paper %T On Gradient-like Explanation under a Black-box Setting: When Black-box Explanations Become as Good as White-box %A Yi Cai %A Gerhard Wunder %B Proceedings of the 41st International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2024 %E Ruslan Salakhutdinov %E Zico Kolter %E Katherine Heller %E Adrian Weller %E Nuria Oliver %E Jonathan Scarlett %E Felix Berkenkamp %F pmlr-v235-cai24h %I PMLR %P 5360--5382 %U https://proceedings.mlr.press/v235/cai24h.html %V 235 %X Attribution methods shed light on the explainability of data-driven approaches such as deep learning models by uncovering the most influential features in a to-be-explained decision. While determining feature attributions via gradients delivers promising results, the internal access required for acquiring gradients can be impractical under safety concerns, thus limiting the applicability of gradient-based approaches. In response to such limited flexibility, this paper presents GEEX (gradient-estimation-based explanation), a method that produces gradient-like explanations through only query-level access. The proposed approach holds a set of fundamental properties for attribution methods, which are mathematically rigorously proved, ensuring the quality of its explanations. In addition to the theoretical analysis, with a focus on image data, the experimental results empirically demonstrate the superiority of the proposed method over state-of-the-art black-box methods and its competitive performance compared to methods with full access.
APA
Cai, Y. & Wunder, G.. (2024). On Gradient-like Explanation under a Black-box Setting: When Black-box Explanations Become as Good as White-box. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:5360-5382 Available from https://proceedings.mlr.press/v235/cai24h.html.

Related Material