GEFA: A General Feature Attribution Framework Using Proxy Gradient Estimation

Yi Cai, Thibaud Ardoin, Gerhard Wunder
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:6165-6192, 2025.

Abstract

Feature attribution explains machine decisions by quantifying each feature’s contribution. While numerous approaches rely on exact gradient measurements, recent work has adopted gradient estimation to derive explanatory information under query-level access, a restrictive yet more practical accessibility assumption known as the black-box setting. Following this direction, this paper introduces GEFA (Gradient-estimation-based Explanation For All), a general feature attribution framework leveraging proxy gradient estimation. Unlike the previous attempt that focused on explaining image classifiers, the proposed explainer derives feature attributions in a proxy space, making it generally applicable to arbitrary black-box models, regardless of input type. In addition to its close relationship with Integrated Gradients, our approach, a path method built upon estimated gradients, surprisingly produces unbiased estimates of Shapley Values. Compared to traditional sampling-based Shapley Value estimators, GEFA avoids potential information waste sourced from computing marginal contributions, thereby improving explanation quality, as demonstrated in quantitative evaluations across various settings.

Cite this Paper


BibTeX
@InProceedings{pmlr-v267-cai25a, title = {{GEFA}: A General Feature Attribution Framework Using Proxy Gradient Estimation}, author = {Cai, Yi and Ardoin, Thibaud and Wunder, Gerhard}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, pages = {6165--6192}, year = {2025}, editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry}, volume = {267}, series = {Proceedings of Machine Learning Research}, month = {13--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/cai25a/cai25a.pdf}, url = {https://proceedings.mlr.press/v267/cai25a.html}, abstract = {Feature attribution explains machine decisions by quantifying each feature’s contribution. While numerous approaches rely on exact gradient measurements, recent work has adopted gradient estimation to derive explanatory information under query-level access, a restrictive yet more practical accessibility assumption known as the black-box setting. Following this direction, this paper introduces GEFA (Gradient-estimation-based Explanation For All), a general feature attribution framework leveraging proxy gradient estimation. Unlike the previous attempt that focused on explaining image classifiers, the proposed explainer derives feature attributions in a proxy space, making it generally applicable to arbitrary black-box models, regardless of input type. In addition to its close relationship with Integrated Gradients, our approach, a path method built upon estimated gradients, surprisingly produces unbiased estimates of Shapley Values. Compared to traditional sampling-based Shapley Value estimators, GEFA avoids potential information waste sourced from computing marginal contributions, thereby improving explanation quality, as demonstrated in quantitative evaluations across various settings.} }
Endnote
%0 Conference Paper %T GEFA: A General Feature Attribution Framework Using Proxy Gradient Estimation %A Yi Cai %A Thibaud Ardoin %A Gerhard Wunder %B Proceedings of the 42nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Aarti Singh %E Maryam Fazel %E Daniel Hsu %E Simon Lacoste-Julien %E Felix Berkenkamp %E Tegan Maharaj %E Kiri Wagstaff %E Jerry Zhu %F pmlr-v267-cai25a %I PMLR %P 6165--6192 %U https://proceedings.mlr.press/v267/cai25a.html %V 267 %X Feature attribution explains machine decisions by quantifying each feature’s contribution. While numerous approaches rely on exact gradient measurements, recent work has adopted gradient estimation to derive explanatory information under query-level access, a restrictive yet more practical accessibility assumption known as the black-box setting. Following this direction, this paper introduces GEFA (Gradient-estimation-based Explanation For All), a general feature attribution framework leveraging proxy gradient estimation. Unlike the previous attempt that focused on explaining image classifiers, the proposed explainer derives feature attributions in a proxy space, making it generally applicable to arbitrary black-box models, regardless of input type. In addition to its close relationship with Integrated Gradients, our approach, a path method built upon estimated gradients, surprisingly produces unbiased estimates of Shapley Values. Compared to traditional sampling-based Shapley Value estimators, GEFA avoids potential information waste sourced from computing marginal contributions, thereby improving explanation quality, as demonstrated in quantitative evaluations across various settings.
APA
Cai, Y., Ardoin, T. & Wunder, G.. (2025). GEFA: A General Feature Attribution Framework Using Proxy Gradient Estimation. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:6165-6192 Available from https://proceedings.mlr.press/v267/cai25a.html.

Related Material