Improved feature importance computation for tree models based on the Banzhaf value

Adam Karczmarz; Tomasz Michalak; Anish Mukherjee; Piotr Sankowski; Piotr Wygocki

Improved feature importance computation for tree models based on the Banzhaf value

Adam Karczmarz, Tomasz Michalak, Anish Mukherjee, Piotr Sankowski, Piotr Wygocki

Proceedings of the Thirty-Eighth Conference on Uncertainty in Artificial Intelligence, PMLR 180:969-979, 2022.

Abstract

The Shapley value – a fundamental game-theoretic solution concept – has recently become one of the main tools used to explain predictions of tree ensemble models. Another well-known game-theoretic solution concept is the Banzhaf value. Although the Banzhaf value is closely related to the Shapley value, its properties w.r.t. feature attribution have not been understood equally well. This paper shows that, for tree ensemble models, the Banzhaf value offers some crucial advantages over the Shapley value while providing similar feature attributions. In particular, we first give an optimal O(TL + n) time algorithm for computing the Banzhaf value-based attribution of a tree ensemble model’s output. Here, T is the number of trees, L is the maximum number of leaves in a tree, and n is the number of features. In comparison, the state-of-the-art Shapley value-based algorithm runs in O(TLD^2 + n) time, where D denotes the maximum depth of a tree in the ensemble. Next, we experimentally compare the Banzhaf and Shapley values for tree ensemble models. Both methods deliver essentially the same average importance scores for the studied datasets using two different tree ensemble models (the sklearn implementation of Decision Trees or xgboost implementation of Gradient Boosting Decision Trees). However, our results indicate that, on top of being computable faster, the Banzhaf is more numerically robust than the Shapley value.

Cite this Paper

BibTeX


@InProceedings{pmlr-v180-karczmarz22a,
  title = 	 {Improved feature importance computation for tree models based on the Banzhaf value},
  author =       {Karczmarz, Adam and Michalak, Tomasz and Mukherjee, Anish and Sankowski, Piotr and Wygocki, Piotr},
  booktitle = 	 {Proceedings of the Thirty-Eighth Conference on Uncertainty in Artificial Intelligence},
  pages = 	 {969--979},
  year = 	 {2022},
  editor = 	 {Cussens, James and Zhang, Kun},
  volume = 	 {180},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {01--05 Aug},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v180/karczmarz22a/karczmarz22a.pdf},
  url = 	 {https://proceedings.mlr.press/v180/karczmarz22a.html},
  abstract = 	 {The Shapley value – a fundamental game-theoretic solution concept – has recently become one of the main tools used to explain predictions of tree ensemble models. Another well-known game-theoretic solution concept is the Banzhaf value. Although the Banzhaf value is closely related to the Shapley value, its properties w.r.t. feature attribution have not been understood equally well. This paper shows that, for tree ensemble models, the Banzhaf value offers some crucial advantages over the Shapley value while providing similar feature attributions. In particular, we first give an optimal O(TL + n) time algorithm for computing the Banzhaf value-based attribution of a tree ensemble model’s output. Here, T is the number of trees, L is the maximum number of leaves in a tree, and n is the number of features. In comparison, the state-of-the-art Shapley value-based algorithm runs in O(TLD^2 + n) time, where D denotes the maximum depth of a tree in the ensemble. Next, we experimentally compare the Banzhaf and Shapley values for tree ensemble models. Both methods deliver essentially the same average importance scores for the studied datasets using two different tree ensemble models (the sklearn implementation of Decision Trees or xgboost implementation of Gradient Boosting Decision Trees). However, our results indicate that, on top of being computable faster, the Banzhaf is more numerically robust than the Shapley value.}
}

Endnote

%0 Conference Paper
%T Improved feature importance computation for tree models based on the Banzhaf value
%A Adam Karczmarz
%A Tomasz Michalak
%A Anish Mukherjee
%A Piotr Sankowski
%A Piotr Wygocki
%B Proceedings of the Thirty-Eighth Conference on Uncertainty in Artificial Intelligence
%C Proceedings of Machine Learning Research
%D 2022
%E James Cussens
%E Kun Zhang	
%F pmlr-v180-karczmarz22a
%I PMLR
%P 969--979
%U https://proceedings.mlr.press/v180/karczmarz22a.html
%V 180
%X The Shapley value – a fundamental game-theoretic solution concept – has recently become one of the main tools used to explain predictions of tree ensemble models. Another well-known game-theoretic solution concept is the Banzhaf value. Although the Banzhaf value is closely related to the Shapley value, its properties w.r.t. feature attribution have not been understood equally well. This paper shows that, for tree ensemble models, the Banzhaf value offers some crucial advantages over the Shapley value while providing similar feature attributions. In particular, we first give an optimal O(TL + n) time algorithm for computing the Banzhaf value-based attribution of a tree ensemble model’s output. Here, T is the number of trees, L is the maximum number of leaves in a tree, and n is the number of features. In comparison, the state-of-the-art Shapley value-based algorithm runs in O(TLD^2 + n) time, where D denotes the maximum depth of a tree in the ensemble. Next, we experimentally compare the Banzhaf and Shapley values for tree ensemble models. Both methods deliver essentially the same average importance scores for the studied datasets using two different tree ensemble models (the sklearn implementation of Decision Trees or xgboost implementation of Gradient Boosting Decision Trees). However, our results indicate that, on top of being computable faster, the Banzhaf is more numerically robust than the Shapley value.

APA


Karczmarz, A., Michalak, T., Mukherjee, A., Sankowski, P. & Wygocki, P.. (2022). Improved feature importance computation for tree models based on the Banzhaf value. Proceedings of the Thirty-Eighth Conference on Uncertainty in Artificial Intelligence, in Proceedings of Machine Learning Research 180:969-979 Available from https://proceedings.mlr.press/v180/karczmarz22a.html.

Improved feature importance computation for tree models based on the Banzhaf value

Abstract

Cite this Paper

Related Material