On Theory for BART
Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics, PMLR 89:2839-2848, 2019.
Ensemble learning is a statistical paradigm built on the premise that many weak learners can perform exceptionally well when deployed collectively. The BART method of Chipman et al. (2010) is a prominent example of Bayesian ensemble learning, where each learner is a tree. Due to its impressive performance, BART has received a lot of attention from practitioners. Despite its wide popularity, however, theoretical studies of BART have begun emerging only very recently. Laying down foundation for the theoretical analysis of Bayesian forests, Rockova and van der Pas (2017) showed optimal posterior concentration under conditionally uniform tree priors. These priors deviate from the actual priors implemented in BART. Here, we study the exact BART prior and propose a simple modification so that it also enjoys optimality properties. To this end, we dive into the branching processes theory. We obtain tail bounds for the distribution of total progeny under heterogeneous Galton-Watson (GW) processes using their connection to random walks. We conclude with a result stating optimal rate of convergence for BART.