CRAFTML, an Efficient Clustering-based Random Forest for Extreme Multi-label Learning

Wissam Siblini, Pascale Kuntz, Frank Meyer
; Proceedings of the 35th International Conference on Machine Learning, PMLR 80:4664-4673, 2018.

Abstract

Extreme Multi-label Learning (XML) considers large sets of items described by a number of labels that can exceed one million. Tree-based methods, which hierarchically partition the problem into small scale sub-problems, are particularly promising in this context to reduce the learning/prediction complexity and to open the way to parallelization. However, the current best approaches do not exploit tree randomization which has shown its efficiency in random forests and they resort to complex partitioning strategies. To overcome these limits, we here introduce a new random forest based algorithm with a very fast partitioning approach called CRAFTML. Experimental comparisons on nine datasets from the XML literature show that it outperforms the other tree-based approaches. Moreover with a parallelized implementation reduced to five cores, it is competitive with the best state-of-the-art methods which run on one hundred-core machines.

Cite this Paper


BibTeX
@InProceedings{pmlr-v80-siblini18a, title = {{CRAFTML}, an Efficient Clustering-based Random Forest for Extreme Multi-label Learning}, author = {Siblini, Wissam and Kuntz, Pascale and Meyer, Frank}, booktitle = {Proceedings of the 35th International Conference on Machine Learning}, pages = {4664--4673}, year = {2018}, editor = {Jennifer Dy and Andreas Krause}, volume = {80}, series = {Proceedings of Machine Learning Research}, address = {Stockholmsmässan, Stockholm Sweden}, month = {10--15 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v80/siblini18a/siblini18a.pdf}, url = {http://proceedings.mlr.press/v80/siblini18a.html}, abstract = {Extreme Multi-label Learning (XML) considers large sets of items described by a number of labels that can exceed one million. Tree-based methods, which hierarchically partition the problem into small scale sub-problems, are particularly promising in this context to reduce the learning/prediction complexity and to open the way to parallelization. However, the current best approaches do not exploit tree randomization which has shown its efficiency in random forests and they resort to complex partitioning strategies. To overcome these limits, we here introduce a new random forest based algorithm with a very fast partitioning approach called CRAFTML. Experimental comparisons on nine datasets from the XML literature show that it outperforms the other tree-based approaches. Moreover with a parallelized implementation reduced to five cores, it is competitive with the best state-of-the-art methods which run on one hundred-core machines.} }
Endnote
%0 Conference Paper %T CRAFTML, an Efficient Clustering-based Random Forest for Extreme Multi-label Learning %A Wissam Siblini %A Pascale Kuntz %A Frank Meyer %B Proceedings of the 35th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2018 %E Jennifer Dy %E Andreas Krause %F pmlr-v80-siblini18a %I PMLR %J Proceedings of Machine Learning Research %P 4664--4673 %U http://proceedings.mlr.press %V 80 %W PMLR %X Extreme Multi-label Learning (XML) considers large sets of items described by a number of labels that can exceed one million. Tree-based methods, which hierarchically partition the problem into small scale sub-problems, are particularly promising in this context to reduce the learning/prediction complexity and to open the way to parallelization. However, the current best approaches do not exploit tree randomization which has shown its efficiency in random forests and they resort to complex partitioning strategies. To overcome these limits, we here introduce a new random forest based algorithm with a very fast partitioning approach called CRAFTML. Experimental comparisons on nine datasets from the XML literature show that it outperforms the other tree-based approaches. Moreover with a parallelized implementation reduced to five cores, it is competitive with the best state-of-the-art methods which run on one hundred-core machines.
APA
Siblini, W., Kuntz, P. & Meyer, F.. (2018). CRAFTML, an Efficient Clustering-based Random Forest for Extreme Multi-label Learning. Proceedings of the 35th International Conference on Machine Learning, in PMLR 80:4664-4673

Related Material