SHAFF: Fast and consistent SHApley eFfect estimates via random Forests

Clément Bénard, Gérard Biau, Sébastien Da Veiga, Erwan Scornet
Proceedings of The 25th International Conference on Artificial Intelligence and Statistics, PMLR 151:5563-5582, 2022.

Abstract

Interpretability of learning algorithms is crucial for applications involving critical decisions, and variable importance is one of the main interpretation tools. Shapley effects are now widely used to interpret both tree ensembles and neural networks, as they can efficiently handle dependence and interactions in the data, as opposed to most other variable importance measures. However, estimating Shapley effects is a challenging task, because of the computational complexity and the conditional expectation estimates. Accordingly, existing Shapley algorithms have flaws: a costly running time, or a bias when input variables are dependent. Therefore, we introduce SHAFF, SHApley eFfects via random Forests, a fast and accurate Shapley effect estimate, even when input variables are dependent. We show SHAFF efficiency through both a theoretical analysis of its consistency, and the practical performance improvements over competitors with extensive experiments. An implementation of SHAFF in C++ and R is available online.

Cite this Paper


BibTeX
@InProceedings{pmlr-v151-benard22a, title = { SHAFF: Fast and consistent SHApley eFfect estimates via random Forests }, author = {B\'enard, Cl\'ement and Biau, G\'erard and Da Veiga, S\'ebastien and Scornet, Erwan}, booktitle = {Proceedings of The 25th International Conference on Artificial Intelligence and Statistics}, pages = {5563--5582}, year = {2022}, editor = {Camps-Valls, Gustau and Ruiz, Francisco J. R. and Valera, Isabel}, volume = {151}, series = {Proceedings of Machine Learning Research}, month = {28--30 Mar}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v151/benard22a/benard22a.pdf}, url = {https://proceedings.mlr.press/v151/benard22a.html}, abstract = { Interpretability of learning algorithms is crucial for applications involving critical decisions, and variable importance is one of the main interpretation tools. Shapley effects are now widely used to interpret both tree ensembles and neural networks, as they can efficiently handle dependence and interactions in the data, as opposed to most other variable importance measures. However, estimating Shapley effects is a challenging task, because of the computational complexity and the conditional expectation estimates. Accordingly, existing Shapley algorithms have flaws: a costly running time, or a bias when input variables are dependent. Therefore, we introduce SHAFF, SHApley eFfects via random Forests, a fast and accurate Shapley effect estimate, even when input variables are dependent. We show SHAFF efficiency through both a theoretical analysis of its consistency, and the practical performance improvements over competitors with extensive experiments. An implementation of SHAFF in C++ and R is available online. } }
Endnote
%0 Conference Paper %T SHAFF: Fast and consistent SHApley eFfect estimates via random Forests %A Clément Bénard %A Gérard Biau %A Sébastien Da Veiga %A Erwan Scornet %B Proceedings of The 25th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2022 %E Gustau Camps-Valls %E Francisco J. R. Ruiz %E Isabel Valera %F pmlr-v151-benard22a %I PMLR %P 5563--5582 %U https://proceedings.mlr.press/v151/benard22a.html %V 151 %X Interpretability of learning algorithms is crucial for applications involving critical decisions, and variable importance is one of the main interpretation tools. Shapley effects are now widely used to interpret both tree ensembles and neural networks, as they can efficiently handle dependence and interactions in the data, as opposed to most other variable importance measures. However, estimating Shapley effects is a challenging task, because of the computational complexity and the conditional expectation estimates. Accordingly, existing Shapley algorithms have flaws: a costly running time, or a bias when input variables are dependent. Therefore, we introduce SHAFF, SHApley eFfects via random Forests, a fast and accurate Shapley effect estimate, even when input variables are dependent. We show SHAFF efficiency through both a theoretical analysis of its consistency, and the practical performance improvements over competitors with extensive experiments. An implementation of SHAFF in C++ and R is available online.
APA
Bénard, C., Biau, G., Da Veiga, S. & Scornet, E.. (2022). SHAFF: Fast and consistent SHApley eFfect estimates via random Forests . Proceedings of The 25th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 151:5563-5582 Available from https://proceedings.mlr.press/v151/benard22a.html.

Related Material