Incorporating Grouping Information into Bayesian Decision Tree Ensembles

Junliang Du, Antonio Linero
Proceedings of the 36th International Conference on Machine Learning, PMLR 97:1686-1695, 2019.

Abstract

We consider the problem of nonparametric regression in the high-dimensional setting in which $P \gg N$. We study the use of overlapping group structures to improve prediction and variable selection. These structures arise commonly when analyzing DNA microarray data, where genes can naturally be grouped according to genetic pathways. We incorporate overlapping group structure into a Bayesian additive regression trees model using a prior constructed so that, if a variable from some group is used to construct a split, this increases the probability that subsequent splits will use predictors from the same group. We refer to our model as an overlapping group Bayesian additive regression trees (OG-BART) model, and our prior on the splits an overlapping group Dirichlet (OG-Dirichlet) prior. Like the sparse group lasso, our prior encourages sparsity both within and between groups. We study the correlation structure of the prior, illustrate the proposed methodology on simulated data, and apply the methodology to gene expression data to learn which genetic pathways are predictive of breast cancer tumor metastasis.

Cite this Paper


BibTeX
@InProceedings{pmlr-v97-du19d, title = {Incorporating Grouping Information into {B}ayesian Decision Tree Ensembles}, author = {Du, Junliang and Linero, Antonio}, booktitle = {Proceedings of the 36th International Conference on Machine Learning}, pages = {1686--1695}, year = {2019}, editor = {Chaudhuri, Kamalika and Salakhutdinov, Ruslan}, volume = {97}, series = {Proceedings of Machine Learning Research}, month = {09--15 Jun}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v97/du19d/du19d.pdf}, url = {https://proceedings.mlr.press/v97/du19d.html}, abstract = {We consider the problem of nonparametric regression in the high-dimensional setting in which $P \gg N$. We study the use of overlapping group structures to improve prediction and variable selection. These structures arise commonly when analyzing DNA microarray data, where genes can naturally be grouped according to genetic pathways. We incorporate overlapping group structure into a Bayesian additive regression trees model using a prior constructed so that, if a variable from some group is used to construct a split, this increases the probability that subsequent splits will use predictors from the same group. We refer to our model as an overlapping group Bayesian additive regression trees (OG-BART) model, and our prior on the splits an overlapping group Dirichlet (OG-Dirichlet) prior. Like the sparse group lasso, our prior encourages sparsity both within and between groups. We study the correlation structure of the prior, illustrate the proposed methodology on simulated data, and apply the methodology to gene expression data to learn which genetic pathways are predictive of breast cancer tumor metastasis.} }
Endnote
%0 Conference Paper %T Incorporating Grouping Information into Bayesian Decision Tree Ensembles %A Junliang Du %A Antonio Linero %B Proceedings of the 36th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2019 %E Kamalika Chaudhuri %E Ruslan Salakhutdinov %F pmlr-v97-du19d %I PMLR %P 1686--1695 %U https://proceedings.mlr.press/v97/du19d.html %V 97 %X We consider the problem of nonparametric regression in the high-dimensional setting in which $P \gg N$. We study the use of overlapping group structures to improve prediction and variable selection. These structures arise commonly when analyzing DNA microarray data, where genes can naturally be grouped according to genetic pathways. We incorporate overlapping group structure into a Bayesian additive regression trees model using a prior constructed so that, if a variable from some group is used to construct a split, this increases the probability that subsequent splits will use predictors from the same group. We refer to our model as an overlapping group Bayesian additive regression trees (OG-BART) model, and our prior on the splits an overlapping group Dirichlet (OG-Dirichlet) prior. Like the sparse group lasso, our prior encourages sparsity both within and between groups. We study the correlation structure of the prior, illustrate the proposed methodology on simulated data, and apply the methodology to gene expression data to learn which genetic pathways are predictive of breast cancer tumor metastasis.
APA
Du, J. & Linero, A.. (2019). Incorporating Grouping Information into Bayesian Decision Tree Ensembles. Proceedings of the 36th International Conference on Machine Learning, in Proceedings of Machine Learning Research 97:1686-1695 Available from https://proceedings.mlr.press/v97/du19d.html.

Related Material