On Theory for BART

Veronika Ročková; Enakshi Saha

On Theory for BART

Veronika Ročková, Enakshi Saha

Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics, PMLR 89:2839-2848, 2019.

Abstract

Ensemble learning is a statistical paradigm built on the premise that many weak learners can perform exceptionally well when deployed collectively. The BART method of Chipman et al. (2010) is a prominent example of Bayesian ensemble learning, where each learner is a tree. Due to its impressive performance, BART has received a lot of attention from practitioners. Despite its wide popularity, however, theoretical studies of BART have begun emerging only very recently. Laying down foundation for the theoretical analysis of Bayesian forests, Rockova and van der Pas (2017) showed optimal posterior concentration under conditionally uniform tree priors. These priors deviate from the actual priors implemented in BART. Here, we study the exact BART prior and propose a simple modification so that it also enjoys optimality properties. To this end, we dive into the branching processes theory. We obtain tail bounds for the distribution of total progeny under heterogeneous Galton-Watson (GW) processes using their connection to random walks. We conclude with a result stating optimal rate of convergence for BART.

Cite this Paper

BibTeX


@InProceedings{pmlr-v89-rockova19a,
  title = 	 {On Theory for BART},
  author =       {Ro\v{c}kov\'a, Veronika and Saha, Enakshi},
  booktitle = 	 {Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics},
  pages = 	 {2839--2848},
  year = 	 {2019},
  editor = 	 {Chaudhuri, Kamalika and Sugiyama, Masashi},
  volume = 	 {89},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {16--18 Apr},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v89/rockova19a/rockova19a.pdf},
  url = 	 {https://proceedings.mlr.press/v89/rockova19a.html},
  abstract = 	 {Ensemble learning is a statistical paradigm built on the premise that  many weak learners can perform exceptionally well when deployed collectively. The BART method of Chipman et al. (2010) is a prominent example of Bayesian ensemble learning, where each learner is a tree. Due to its impressive performance, BART has received a lot of attention from practitioners. Despite its wide popularity, however, theoretical studies of BART have  begun emerging only very recently. Laying down foundation for the theoretical analysis of Bayesian forests,  Rockova and van der Pas (2017) showed optimal posterior concentration under conditionally uniform tree priors. These priors  deviate from the actual priors implemented in BART. Here, we study the exact BART prior and propose a simple modification so that  it also enjoys optimality properties. To this end, we dive into the branching processes theory. We obtain  tail bounds for the distribution of total progeny under heterogeneous Galton-Watson (GW) processes using their connection to random walks. We conclude with a result stating  optimal rate of convergence for BART.}
}

Endnote

%0 Conference Paper
%T On Theory for BART
%A Veronika Ročková
%A Enakshi Saha
%B Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics
%C Proceedings of Machine Learning Research
%D 2019
%E Kamalika Chaudhuri
%E Masashi Sugiyama	
%F pmlr-v89-rockova19a
%I PMLR
%P 2839--2848
%U https://proceedings.mlr.press/v89/rockova19a.html
%V 89
%X Ensemble learning is a statistical paradigm built on the premise that  many weak learners can perform exceptionally well when deployed collectively. The BART method of Chipman et al. (2010) is a prominent example of Bayesian ensemble learning, where each learner is a tree. Due to its impressive performance, BART has received a lot of attention from practitioners. Despite its wide popularity, however, theoretical studies of BART have  begun emerging only very recently. Laying down foundation for the theoretical analysis of Bayesian forests,  Rockova and van der Pas (2017) showed optimal posterior concentration under conditionally uniform tree priors. These priors  deviate from the actual priors implemented in BART. Here, we study the exact BART prior and propose a simple modification so that  it also enjoys optimality properties. To this end, we dive into the branching processes theory. We obtain  tail bounds for the distribution of total progeny under heterogeneous Galton-Watson (GW) processes using their connection to random walks. We conclude with a result stating  optimal rate of convergence for BART.

APA


Ročková, V. & Saha, E.. (2019). On Theory for BART. Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 89:2839-2848 Available from https://proceedings.mlr.press/v89/rockova19a.html.

On Theory for BART

Abstract

Cite this Paper

Related Material