Learning Binary Decision Trees by Argmin Differentiation

Valentina Zantedeschi, Matt Kusner, Vlad Niculae
Proceedings of the 38th International Conference on Machine Learning, PMLR 139:12298-12309, 2021.

Abstract

We address the problem of learning binary decision trees that partition data for some downstream task. We propose to learn discrete parameters (i.e., for tree traversals and node pruning) and continuous parameters (i.e., for tree split functions and prediction functions) simultaneously using argmin differentiation. We do so by sparsely relaxing a mixed-integer program for the discrete parameters, to allow gradients to pass through the program to continuous parameters. We derive customized algorithms to efficiently compute the forward and backward passes. This means that our tree learning procedure can be used as an (implicit) layer in arbitrary deep networks, and can be optimized with arbitrary loss functions. We demonstrate that our approach produces binary trees that are competitive with existing single tree and ensemble approaches, in both supervised and unsupervised settings. Further, apart from greedy approaches (which do not have competitive accuracies), our method is faster to train than all other tree-learning baselines we compare with.

Cite this Paper


BibTeX
@InProceedings{pmlr-v139-zantedeschi21a, title = {Learning Binary Decision Trees by Argmin Differentiation}, author = {Zantedeschi, Valentina and Kusner, Matt and Niculae, Vlad}, booktitle = {Proceedings of the 38th International Conference on Machine Learning}, pages = {12298--12309}, year = {2021}, editor = {Meila, Marina and Zhang, Tong}, volume = {139}, series = {Proceedings of Machine Learning Research}, month = {18--24 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v139/zantedeschi21a/zantedeschi21a.pdf}, url = {https://proceedings.mlr.press/v139/zantedeschi21a.html}, abstract = {We address the problem of learning binary decision trees that partition data for some downstream task. We propose to learn discrete parameters (i.e., for tree traversals and node pruning) and continuous parameters (i.e., for tree split functions and prediction functions) simultaneously using argmin differentiation. We do so by sparsely relaxing a mixed-integer program for the discrete parameters, to allow gradients to pass through the program to continuous parameters. We derive customized algorithms to efficiently compute the forward and backward passes. This means that our tree learning procedure can be used as an (implicit) layer in arbitrary deep networks, and can be optimized with arbitrary loss functions. We demonstrate that our approach produces binary trees that are competitive with existing single tree and ensemble approaches, in both supervised and unsupervised settings. Further, apart from greedy approaches (which do not have competitive accuracies), our method is faster to train than all other tree-learning baselines we compare with.} }
Endnote
%0 Conference Paper %T Learning Binary Decision Trees by Argmin Differentiation %A Valentina Zantedeschi %A Matt Kusner %A Vlad Niculae %B Proceedings of the 38th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2021 %E Marina Meila %E Tong Zhang %F pmlr-v139-zantedeschi21a %I PMLR %P 12298--12309 %U https://proceedings.mlr.press/v139/zantedeschi21a.html %V 139 %X We address the problem of learning binary decision trees that partition data for some downstream task. We propose to learn discrete parameters (i.e., for tree traversals and node pruning) and continuous parameters (i.e., for tree split functions and prediction functions) simultaneously using argmin differentiation. We do so by sparsely relaxing a mixed-integer program for the discrete parameters, to allow gradients to pass through the program to continuous parameters. We derive customized algorithms to efficiently compute the forward and backward passes. This means that our tree learning procedure can be used as an (implicit) layer in arbitrary deep networks, and can be optimized with arbitrary loss functions. We demonstrate that our approach produces binary trees that are competitive with existing single tree and ensemble approaches, in both supervised and unsupervised settings. Further, apart from greedy approaches (which do not have competitive accuracies), our method is faster to train than all other tree-learning baselines we compare with.
APA
Zantedeschi, V., Kusner, M. & Niculae, V.. (2021). Learning Binary Decision Trees by Argmin Differentiation. Proceedings of the 38th International Conference on Machine Learning, in Proceedings of Machine Learning Research 139:12298-12309 Available from https://proceedings.mlr.press/v139/zantedeschi21a.html.

Related Material