Gradient Boosted Decision Trees for High Dimensional Sparse Output

Si Si, Huan Zhang, S. Sathiya Keerthi, Dhruv Mahajan, Inderjit S. Dhillon, Cho-Jui Hsieh
Proceedings of the 34th International Conference on Machine Learning, PMLR 70:3182-3190, 2017.

Abstract

In this paper, we study the gradient boosted decision trees (GBDT) when the output space is high dimensional and sparse. For example, in multilabel classification, the output space is a $L$-dimensional 0/1 vector, where $L$ is number of labels that can grow to millions and beyond in many modern applications. We show that vanilla GBDT can easily run out of memory or encounter near-forever running time in this regime, and propose a new GBDT variant, GBDT-SPARSE, to resolve this problem by employing $L_0$ regularization. We then discuss in detail how to utilize this sparsity to conduct GBDT training, including splitting the nodes, computing the sparse residual, and predicting in sublinear time. Finally, we apply our algorithm to extreme multilabel classification problems, and show that the proposed GBDT-SPARSE achieves an order of magnitude improvements in model size and prediction time over existing methods, while yielding similar performance.

Cite this Paper


BibTeX
@InProceedings{pmlr-v70-si17a, title = {Gradient Boosted Decision Trees for High Dimensional Sparse Output}, author = {Si Si and Huan Zhang and S. Sathiya Keerthi and Dhruv Mahajan and Inderjit S. Dhillon and Cho-Jui Hsieh}, booktitle = {Proceedings of the 34th International Conference on Machine Learning}, pages = {3182--3190}, year = {2017}, editor = {Precup, Doina and Teh, Yee Whye}, volume = {70}, series = {Proceedings of Machine Learning Research}, month = {06--11 Aug}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v70/si17a/si17a.pdf}, url = {https://proceedings.mlr.press/v70/si17a.html}, abstract = {In this paper, we study the gradient boosted decision trees (GBDT) when the output space is high dimensional and sparse. For example, in multilabel classification, the output space is a $L$-dimensional 0/1 vector, where $L$ is number of labels that can grow to millions and beyond in many modern applications. We show that vanilla GBDT can easily run out of memory or encounter near-forever running time in this regime, and propose a new GBDT variant, GBDT-SPARSE, to resolve this problem by employing $L_0$ regularization. We then discuss in detail how to utilize this sparsity to conduct GBDT training, including splitting the nodes, computing the sparse residual, and predicting in sublinear time. Finally, we apply our algorithm to extreme multilabel classification problems, and show that the proposed GBDT-SPARSE achieves an order of magnitude improvements in model size and prediction time over existing methods, while yielding similar performance.} }
Endnote
%0 Conference Paper %T Gradient Boosted Decision Trees for High Dimensional Sparse Output %A Si Si %A Huan Zhang %A S. Sathiya Keerthi %A Dhruv Mahajan %A Inderjit S. Dhillon %A Cho-Jui Hsieh %B Proceedings of the 34th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2017 %E Doina Precup %E Yee Whye Teh %F pmlr-v70-si17a %I PMLR %P 3182--3190 %U https://proceedings.mlr.press/v70/si17a.html %V 70 %X In this paper, we study the gradient boosted decision trees (GBDT) when the output space is high dimensional and sparse. For example, in multilabel classification, the output space is a $L$-dimensional 0/1 vector, where $L$ is number of labels that can grow to millions and beyond in many modern applications. We show that vanilla GBDT can easily run out of memory or encounter near-forever running time in this regime, and propose a new GBDT variant, GBDT-SPARSE, to resolve this problem by employing $L_0$ regularization. We then discuss in detail how to utilize this sparsity to conduct GBDT training, including splitting the nodes, computing the sparse residual, and predicting in sublinear time. Finally, we apply our algorithm to extreme multilabel classification problems, and show that the proposed GBDT-SPARSE achieves an order of magnitude improvements in model size and prediction time over existing methods, while yielding similar performance.
APA
Si, S., Zhang, H., Keerthi, S.S., Mahajan, D., Dhillon, I.S. & Hsieh, C.. (2017). Gradient Boosted Decision Trees for High Dimensional Sparse Output. Proceedings of the 34th International Conference on Machine Learning, in Proceedings of Machine Learning Research 70:3182-3190 Available from https://proceedings.mlr.press/v70/si17a.html.

Related Material