On Theory for BART

Veronika Ročková, Enakshi Saha
Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics, PMLR 89:2839-2848, 2019.

Abstract

Ensemble learning is a statistical paradigm built on the premise that many weak learners can perform exceptionally well when deployed collectively. The BART method of Chipman et al. (2010) is a prominent example of Bayesian ensemble learning, where each learner is a tree. Due to its impressive performance, BART has received a lot of attention from practitioners. Despite its wide popularity, however, theoretical studies of BART have begun emerging only very recently. Laying down foundation for the theoretical analysis of Bayesian forests, Rockova and van der Pas (2017) showed optimal posterior concentration under conditionally uniform tree priors. These priors deviate from the actual priors implemented in BART. Here, we study the exact BART prior and propose a simple modification so that it also enjoys optimality properties. To this end, we dive into the branching processes theory. We obtain tail bounds for the distribution of total progeny under heterogeneous Galton-Watson (GW) processes using their connection to random walks. We conclude with a result stating optimal rate of convergence for BART.

Cite this Paper


BibTeX
@InProceedings{pmlr-v89-rockova19a, title = {On Theory for BART}, author = {Ro\v{c}kov\'a, Veronika and Saha, Enakshi}, booktitle = {Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics}, pages = {2839--2848}, year = {2019}, editor = {Chaudhuri, Kamalika and Sugiyama, Masashi}, volume = {89}, series = {Proceedings of Machine Learning Research}, month = {16--18 Apr}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v89/rockova19a/rockova19a.pdf}, url = {https://proceedings.mlr.press/v89/rockova19a.html}, abstract = {Ensemble learning is a statistical paradigm built on the premise that many weak learners can perform exceptionally well when deployed collectively. The BART method of Chipman et al. (2010) is a prominent example of Bayesian ensemble learning, where each learner is a tree. Due to its impressive performance, BART has received a lot of attention from practitioners. Despite its wide popularity, however, theoretical studies of BART have begun emerging only very recently. Laying down foundation for the theoretical analysis of Bayesian forests, Rockova and van der Pas (2017) showed optimal posterior concentration under conditionally uniform tree priors. These priors deviate from the actual priors implemented in BART. Here, we study the exact BART prior and propose a simple modification so that it also enjoys optimality properties. To this end, we dive into the branching processes theory. We obtain tail bounds for the distribution of total progeny under heterogeneous Galton-Watson (GW) processes using their connection to random walks. We conclude with a result stating optimal rate of convergence for BART.} }
Endnote
%0 Conference Paper %T On Theory for BART %A Veronika Ročková %A Enakshi Saha %B Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2019 %E Kamalika Chaudhuri %E Masashi Sugiyama %F pmlr-v89-rockova19a %I PMLR %P 2839--2848 %U https://proceedings.mlr.press/v89/rockova19a.html %V 89 %X Ensemble learning is a statistical paradigm built on the premise that many weak learners can perform exceptionally well when deployed collectively. The BART method of Chipman et al. (2010) is a prominent example of Bayesian ensemble learning, where each learner is a tree. Due to its impressive performance, BART has received a lot of attention from practitioners. Despite its wide popularity, however, theoretical studies of BART have begun emerging only very recently. Laying down foundation for the theoretical analysis of Bayesian forests, Rockova and van der Pas (2017) showed optimal posterior concentration under conditionally uniform tree priors. These priors deviate from the actual priors implemented in BART. Here, we study the exact BART prior and propose a simple modification so that it also enjoys optimality properties. To this end, we dive into the branching processes theory. We obtain tail bounds for the distribution of total progeny under heterogeneous Galton-Watson (GW) processes using their connection to random walks. We conclude with a result stating optimal rate of convergence for BART.
APA
Ročková, V. & Saha, E.. (2019). On Theory for BART. Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 89:2839-2848 Available from https://proceedings.mlr.press/v89/rockova19a.html.

Related Material