Exploiting Categorical Structure Using Tree-Based Methods

Brian Lucena
Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, PMLR 108:2949-2958, 2020.

Abstract

Standard methods of using categorical variables as predictors either endow them with an ordinal structure or assume they have no structure at all. However, categorical variables often possess structure that is more complicated than a linear ordering can capture. We develop a mathematical framework for representing the structure of categorical variables and show how to generalize decision trees to make use of this structure. This approach is applicable to methods such as Gradient Boosted Trees which use a decision tree as the underlying learner. We show results on weather data to demonstrate the improvement yielded by this approach.

Cite this Paper


BibTeX
@InProceedings{pmlr-v108-lucena20a, title = {Exploiting Categorical Structure Using Tree-Based Methods}, author = {Lucena, Brian}, booktitle = {Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics}, pages = {2949--2958}, year = {2020}, editor = {Chiappa, Silvia and Calandra, Roberto}, volume = {108}, series = {Proceedings of Machine Learning Research}, month = {26--28 Aug}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v108/lucena20a/lucena20a.pdf}, url = {https://proceedings.mlr.press/v108/lucena20a.html}, abstract = {Standard methods of using categorical variables as predictors either endow them with an ordinal structure or assume they have no structure at all. However, categorical variables often possess structure that is more complicated than a linear ordering can capture. We develop a mathematical framework for representing the structure of categorical variables and show how to generalize decision trees to make use of this structure. This approach is applicable to methods such as Gradient Boosted Trees which use a decision tree as the underlying learner. We show results on weather data to demonstrate the improvement yielded by this approach.} }
Endnote
%0 Conference Paper %T Exploiting Categorical Structure Using Tree-Based Methods %A Brian Lucena %B Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2020 %E Silvia Chiappa %E Roberto Calandra %F pmlr-v108-lucena20a %I PMLR %P 2949--2958 %U https://proceedings.mlr.press/v108/lucena20a.html %V 108 %X Standard methods of using categorical variables as predictors either endow them with an ordinal structure or assume they have no structure at all. However, categorical variables often possess structure that is more complicated than a linear ordering can capture. We develop a mathematical framework for representing the structure of categorical variables and show how to generalize decision trees to make use of this structure. This approach is applicable to methods such as Gradient Boosted Trees which use a decision tree as the underlying learner. We show results on weather data to demonstrate the improvement yielded by this approach.
APA
Lucena, B.. (2020). Exploiting Categorical Structure Using Tree-Based Methods. Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 108:2949-2958 Available from https://proceedings.mlr.press/v108/lucena20a.html.

Related Material