Leveraging Predictive Equivalence in Decision Trees

Hayden Mctavish, Zachery Boner, Jon Donnelly, Margo Seltzer, Cynthia Rudin
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:43440-43475, 2025.

Abstract

Decision trees are widely used for interpretable machine learning due to their clearly structured reasoning process. However, this structure belies a challenge we refer to as predictive equivalence: a given tree’s decision boundary can be represented by many different decision trees. The presence of models with identical decision boundaries but different evaluation processes makes model selection challenging. The models will have different variable importance and behave differently in the presence of missing values, but most optimization procedures will arbitrarily choose one such model to return. We present a boolean logical representation of decision trees that does not exhibit predictive equivalence and is faithful to the underlying decision boundary. We apply our representation to several downstream machine learning tasks. Using our representation, we show that decision trees are surprisingly robust to test-time missingness of feature values; we address predictive equivalence’s impact on quantifying variable importance; and we present an algorithm to optimize the cost of reaching predictions.

Cite this Paper


BibTeX
@InProceedings{pmlr-v267-mctavish25a, title = {Leveraging Predictive Equivalence in Decision Trees}, author = {Mctavish, Hayden and Boner, Zachery and Donnelly, Jon and Seltzer, Margo and Rudin, Cynthia}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, pages = {43440--43475}, year = {2025}, editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry}, volume = {267}, series = {Proceedings of Machine Learning Research}, month = {13--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/mctavish25a/mctavish25a.pdf}, url = {https://proceedings.mlr.press/v267/mctavish25a.html}, abstract = {Decision trees are widely used for interpretable machine learning due to their clearly structured reasoning process. However, this structure belies a challenge we refer to as predictive equivalence: a given tree’s decision boundary can be represented by many different decision trees. The presence of models with identical decision boundaries but different evaluation processes makes model selection challenging. The models will have different variable importance and behave differently in the presence of missing values, but most optimization procedures will arbitrarily choose one such model to return. We present a boolean logical representation of decision trees that does not exhibit predictive equivalence and is faithful to the underlying decision boundary. We apply our representation to several downstream machine learning tasks. Using our representation, we show that decision trees are surprisingly robust to test-time missingness of feature values; we address predictive equivalence’s impact on quantifying variable importance; and we present an algorithm to optimize the cost of reaching predictions.} }
Endnote
%0 Conference Paper %T Leveraging Predictive Equivalence in Decision Trees %A Hayden Mctavish %A Zachery Boner %A Jon Donnelly %A Margo Seltzer %A Cynthia Rudin %B Proceedings of the 42nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Aarti Singh %E Maryam Fazel %E Daniel Hsu %E Simon Lacoste-Julien %E Felix Berkenkamp %E Tegan Maharaj %E Kiri Wagstaff %E Jerry Zhu %F pmlr-v267-mctavish25a %I PMLR %P 43440--43475 %U https://proceedings.mlr.press/v267/mctavish25a.html %V 267 %X Decision trees are widely used for interpretable machine learning due to their clearly structured reasoning process. However, this structure belies a challenge we refer to as predictive equivalence: a given tree’s decision boundary can be represented by many different decision trees. The presence of models with identical decision boundaries but different evaluation processes makes model selection challenging. The models will have different variable importance and behave differently in the presence of missing values, but most optimization procedures will arbitrarily choose one such model to return. We present a boolean logical representation of decision trees that does not exhibit predictive equivalence and is faithful to the underlying decision boundary. We apply our representation to several downstream machine learning tasks. Using our representation, we show that decision trees are surprisingly robust to test-time missingness of feature values; we address predictive equivalence’s impact on quantifying variable importance; and we present an algorithm to optimize the cost of reaching predictions.
APA
Mctavish, H., Boner, Z., Donnelly, J., Seltzer, M. & Rudin, C.. (2025). Leveraging Predictive Equivalence in Decision Trees. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:43440-43475 Available from https://proceedings.mlr.press/v267/mctavish25a.html.

Related Material