Mondrian Forests for Large-Scale Regression when Uncertainty Matters

Balaji Lakshminarayanan, Daniel M. Roy, Yee Whye Teh
Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, PMLR 51:1478-1487, 2016.

Abstract

Many real-world regression problems demand a measure of the uncertainty associated with each prediction. Standard decision forests deliver efficient state-of-the-art predictive performance, but high-quality uncertainty estimates are lacking. Gaussian processes (GPs) deliver uncertainty estimates, but scaling GPs to large-scale data sets comes at the cost of approximating the uncertainty estimates. We extend Mondrian forests, first proposed by Lakshminarayanan et al. (2014) for classification problems, to the large-scale nonparametric regression setting. Using a novel hierarchical Gaussian prior that dovetails with the Mondrian forest framework, we obtain principled uncertainty estimates, while still retaining the computational advantages of decision forests. Through a combination of illustrative examples, real-world large-scale datasets and Bayesian optimization benchmarks, we demonstrate that Mondrian forests outperform approximate GPs on large-scale regression tasks and deliver better-calibrated uncertainty assessments than decision-forest-based methods.

Cite this Paper


BibTeX
@InProceedings{pmlr-v51-lakshminarayanan16, title = {Mondrian Forests for Large-Scale Regression when Uncertainty Matters}, author = {Balaji Lakshminarayanan and Daniel M. Roy and Yee Whye Teh}, booktitle = {Proceedings of the 19th International Conference on Artificial Intelligence and Statistics}, pages = {1478--1487}, year = {2016}, editor = {Arthur Gretton and Christian C. Robert}, volume = {51}, series = {Proceedings of Machine Learning Research}, address = {Cadiz, Spain}, month = {09--11 May}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v51/lakshminarayanan16.pdf}, url = { http://proceedings.mlr.press/v51/lakshminarayanan16.html }, abstract = {Many real-world regression problems demand a measure of the uncertainty associated with each prediction. Standard decision forests deliver efficient state-of-the-art predictive performance, but high-quality uncertainty estimates are lacking. Gaussian processes (GPs) deliver uncertainty estimates, but scaling GPs to large-scale data sets comes at the cost of approximating the uncertainty estimates. We extend Mondrian forests, first proposed by Lakshminarayanan et al. (2014) for classification problems, to the large-scale nonparametric regression setting. Using a novel hierarchical Gaussian prior that dovetails with the Mondrian forest framework, we obtain principled uncertainty estimates, while still retaining the computational advantages of decision forests. Through a combination of illustrative examples, real-world large-scale datasets and Bayesian optimization benchmarks, we demonstrate that Mondrian forests outperform approximate GPs on large-scale regression tasks and deliver better-calibrated uncertainty assessments than decision-forest-based methods.} }
Endnote
%0 Conference Paper %T Mondrian Forests for Large-Scale Regression when Uncertainty Matters %A Balaji Lakshminarayanan %A Daniel M. Roy %A Yee Whye Teh %B Proceedings of the 19th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2016 %E Arthur Gretton %E Christian C. Robert %F pmlr-v51-lakshminarayanan16 %I PMLR %P 1478--1487 %U http://proceedings.mlr.press/v51/lakshminarayanan16.html %V 51 %X Many real-world regression problems demand a measure of the uncertainty associated with each prediction. Standard decision forests deliver efficient state-of-the-art predictive performance, but high-quality uncertainty estimates are lacking. Gaussian processes (GPs) deliver uncertainty estimates, but scaling GPs to large-scale data sets comes at the cost of approximating the uncertainty estimates. We extend Mondrian forests, first proposed by Lakshminarayanan et al. (2014) for classification problems, to the large-scale nonparametric regression setting. Using a novel hierarchical Gaussian prior that dovetails with the Mondrian forest framework, we obtain principled uncertainty estimates, while still retaining the computational advantages of decision forests. Through a combination of illustrative examples, real-world large-scale datasets and Bayesian optimization benchmarks, we demonstrate that Mondrian forests outperform approximate GPs on large-scale regression tasks and deliver better-calibrated uncertainty assessments than decision-forest-based methods.
RIS
TY - CPAPER TI - Mondrian Forests for Large-Scale Regression when Uncertainty Matters AU - Balaji Lakshminarayanan AU - Daniel M. Roy AU - Yee Whye Teh BT - Proceedings of the 19th International Conference on Artificial Intelligence and Statistics DA - 2016/05/02 ED - Arthur Gretton ED - Christian C. Robert ID - pmlr-v51-lakshminarayanan16 PB - PMLR DP - Proceedings of Machine Learning Research VL - 51 SP - 1478 EP - 1487 L1 - http://proceedings.mlr.press/v51/lakshminarayanan16.pdf UR - http://proceedings.mlr.press/v51/lakshminarayanan16.html AB - Many real-world regression problems demand a measure of the uncertainty associated with each prediction. Standard decision forests deliver efficient state-of-the-art predictive performance, but high-quality uncertainty estimates are lacking. Gaussian processes (GPs) deliver uncertainty estimates, but scaling GPs to large-scale data sets comes at the cost of approximating the uncertainty estimates. We extend Mondrian forests, first proposed by Lakshminarayanan et al. (2014) for classification problems, to the large-scale nonparametric regression setting. Using a novel hierarchical Gaussian prior that dovetails with the Mondrian forest framework, we obtain principled uncertainty estimates, while still retaining the computational advantages of decision forests. Through a combination of illustrative examples, real-world large-scale datasets and Bayesian optimization benchmarks, we demonstrate that Mondrian forests outperform approximate GPs on large-scale regression tasks and deliver better-calibrated uncertainty assessments than decision-forest-based methods. ER -
APA
Lakshminarayanan, B., Roy, D.M. & Teh, Y.W.. (2016). Mondrian Forests for Large-Scale Regression when Uncertainty Matters. Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 51:1478-1487 Available from http://proceedings.mlr.press/v51/lakshminarayanan16.html .

Related Material