Disentangling Interactions and Dependencies in Feature Attributions

Gunnar König, Eric Günther, Ulrike von Luxburg
Proceedings of The 28th International Conference on Artificial Intelligence and Statistics, PMLR 258:2134-2142, 2025.

Abstract

In explainable machine learning, global feature importance methods try to determine how much each individual feature contributes to predicting the target variable, resulting in one importance score for each feature. But often, predicting the target variable requires interactions between several features (such as in the XOR function), and features might have complex statistical dependencies that allow to partially replace one feature with another one. In commonly used feature importance scores, these cooperative effects are conflated with the features’ individual contributions, making them prone to misinterpretations. In this work, we derive DIP, a new mathematical decomposition of individual feature importance scores that disentangles three components: the standalone contribution and the contributions stemming from interactions and dependencies. We prove that the DIP decomposition is unique and show how it can be estimated in practice. Based on these results, we propose a new visualization of feature importance scores that clearly illustrates the different contributions.

Cite this Paper


BibTeX
@InProceedings{pmlr-v258-konig25a, title = {Disentangling Interactions and Dependencies in Feature Attributions}, author = {K{\"o}nig, Gunnar and G{\"u}nther, Eric and von Luxburg, Ulrike}, booktitle = {Proceedings of The 28th International Conference on Artificial Intelligence and Statistics}, pages = {2134--2142}, year = {2025}, editor = {Li, Yingzhen and Mandt, Stephan and Agrawal, Shipra and Khan, Emtiyaz}, volume = {258}, series = {Proceedings of Machine Learning Research}, month = {03--05 May}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v258/main/assets/konig25a/konig25a.pdf}, url = {https://proceedings.mlr.press/v258/konig25a.html}, abstract = {In explainable machine learning, global feature importance methods try to determine how much each individual feature contributes to predicting the target variable, resulting in one importance score for each feature. But often, predicting the target variable requires interactions between several features (such as in the XOR function), and features might have complex statistical dependencies that allow to partially replace one feature with another one. In commonly used feature importance scores, these cooperative effects are conflated with the features’ individual contributions, making them prone to misinterpretations. In this work, we derive DIP, a new mathematical decomposition of individual feature importance scores that disentangles three components: the standalone contribution and the contributions stemming from interactions and dependencies. We prove that the DIP decomposition is unique and show how it can be estimated in practice. Based on these results, we propose a new visualization of feature importance scores that clearly illustrates the different contributions.} }
Endnote
%0 Conference Paper %T Disentangling Interactions and Dependencies in Feature Attributions %A Gunnar König %A Eric Günther %A Ulrike von Luxburg %B Proceedings of The 28th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2025 %E Yingzhen Li %E Stephan Mandt %E Shipra Agrawal %E Emtiyaz Khan %F pmlr-v258-konig25a %I PMLR %P 2134--2142 %U https://proceedings.mlr.press/v258/konig25a.html %V 258 %X In explainable machine learning, global feature importance methods try to determine how much each individual feature contributes to predicting the target variable, resulting in one importance score for each feature. But often, predicting the target variable requires interactions between several features (such as in the XOR function), and features might have complex statistical dependencies that allow to partially replace one feature with another one. In commonly used feature importance scores, these cooperative effects are conflated with the features’ individual contributions, making them prone to misinterpretations. In this work, we derive DIP, a new mathematical decomposition of individual feature importance scores that disentangles three components: the standalone contribution and the contributions stemming from interactions and dependencies. We prove that the DIP decomposition is unique and show how it can be estimated in practice. Based on these results, we propose a new visualization of feature importance scores that clearly illustrates the different contributions.
APA
König, G., Günther, E. & von Luxburg, U.. (2025). Disentangling Interactions and Dependencies in Feature Attributions. Proceedings of The 28th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 258:2134-2142 Available from https://proceedings.mlr.press/v258/konig25a.html.

Related Material