Large-scale Distributed Dependent Nonparametric Trees

Zhiting Hu; Ho Qirong; Avinava Dubey; Eric Xing

Large-scale Distributed Dependent Nonparametric Trees

Zhiting Hu, Ho Qirong, Avinava Dubey, Eric Xing

Proceedings of the 32nd International Conference on Machine Learning, PMLR 37:1651-1659, 2015.

Abstract

Practical applications of Bayesian nonparametric (BNP) models have been limited, due to their high computational complexity and poor scaling on large data. In this paper, we consider dependent nonparametric trees (DNTs), a powerful infinite model that captures time-evolving hierarchies, and develop a large-scale distributed training system. Our major contributions include: (1) an effective memoized variational inference for DNTs, with a novel birth-merge strategy for exploring the unbounded tree space; (2) a model-parallel scheme for concurrent tree growing/pruning and efficient model alignment, through conflict-free model partitioning and lightweight synchronization; (3) a data-parallel scheme for variational parameter updates that allows distributed processing of massive data. Using 64 cores in 36 hours, our system learns a 10K-node DNT topic model on 8M documents that captures both high-frequency and long-tail topics. Our data and model scales are orders-of-magnitude larger than recent results on the hierarchical Dirichlet process, and the near-linear scalability indicates great potential for even bigger problem sizes.

Cite this Paper

BibTeX


@InProceedings{pmlr-v37-hu15,
  title = 	 {Large-scale Distributed Dependent Nonparametric Trees},
  author = 	 {Hu, Zhiting and Qirong, Ho and Dubey, Avinava and Xing, Eric},
  booktitle = 	 {Proceedings of the 32nd International Conference on Machine Learning},
  pages = 	 {1651--1659},
  year = 	 {2015},
  editor = 	 {Bach, Francis and Blei, David},
  volume = 	 {37},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {Lille, France},
  month = 	 {07--09 Jul},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v37/hu15.pdf},
  url = 	 {https://proceedings.mlr.press/v37/hu15.html},
  abstract = 	 {Practical applications of Bayesian nonparametric (BNP) models have been limited, due to their high computational complexity and poor scaling on large data. In this paper, we consider dependent nonparametric trees (DNTs), a powerful infinite model that captures time-evolving hierarchies, and develop a large-scale distributed training system. Our major contributions include: (1) an effective memoized variational inference for DNTs, with a novel birth-merge strategy for exploring the unbounded tree space; (2) a model-parallel scheme for concurrent tree growing/pruning and efficient model alignment, through conflict-free model partitioning and lightweight synchronization; (3) a data-parallel scheme for variational parameter updates that allows distributed processing of massive data. Using 64 cores in 36 hours, our system learns a 10K-node DNT topic model on 8M documents that captures both high-frequency and long-tail topics. Our data and model scales are orders-of-magnitude larger than recent results on the hierarchical Dirichlet process, and the near-linear scalability indicates great potential for even bigger problem sizes.}
}

Endnote

%0 Conference Paper
%T Large-scale Distributed Dependent Nonparametric Trees
%A Zhiting Hu
%A Ho Qirong
%A Avinava Dubey
%A Eric Xing
%B Proceedings of the 32nd International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2015
%E Francis Bach
%E David Blei	
%F pmlr-v37-hu15
%I PMLR
%P 1651--1659
%U https://proceedings.mlr.press/v37/hu15.html
%V 37
%X Practical applications of Bayesian nonparametric (BNP) models have been limited, due to their high computational complexity and poor scaling on large data. In this paper, we consider dependent nonparametric trees (DNTs), a powerful infinite model that captures time-evolving hierarchies, and develop a large-scale distributed training system. Our major contributions include: (1) an effective memoized variational inference for DNTs, with a novel birth-merge strategy for exploring the unbounded tree space; (2) a model-parallel scheme for concurrent tree growing/pruning and efficient model alignment, through conflict-free model partitioning and lightweight synchronization; (3) a data-parallel scheme for variational parameter updates that allows distributed processing of massive data. Using 64 cores in 36 hours, our system learns a 10K-node DNT topic model on 8M documents that captures both high-frequency and long-tail topics. Our data and model scales are orders-of-magnitude larger than recent results on the hierarchical Dirichlet process, and the near-linear scalability indicates great potential for even bigger problem sizes.

RIS


TY  - CPAPER
TI  - Large-scale Distributed Dependent Nonparametric Trees
AU  - Zhiting Hu
AU  - Ho Qirong
AU  - Avinava Dubey
AU  - Eric Xing
BT  - Proceedings of the 32nd International Conference on Machine Learning
DA  - 2015/06/01
ED  - Francis Bach
ED  - David Blei	
ID  - pmlr-v37-hu15
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 37
SP  - 1651
EP  - 1659
L1  - http://proceedings.mlr.press/v37/hu15.pdf
UR  - https://proceedings.mlr.press/v37/hu15.html
AB  - Practical applications of Bayesian nonparametric (BNP) models have been limited, due to their high computational complexity and poor scaling on large data. In this paper, we consider dependent nonparametric trees (DNTs), a powerful infinite model that captures time-evolving hierarchies, and develop a large-scale distributed training system. Our major contributions include: (1) an effective memoized variational inference for DNTs, with a novel birth-merge strategy for exploring the unbounded tree space; (2) a model-parallel scheme for concurrent tree growing/pruning and efficient model alignment, through conflict-free model partitioning and lightweight synchronization; (3) a data-parallel scheme for variational parameter updates that allows distributed processing of massive data. Using 64 cores in 36 hours, our system learns a 10K-node DNT topic model on 8M documents that captures both high-frequency and long-tail topics. Our data and model scales are orders-of-magnitude larger than recent results on the hierarchical Dirichlet process, and the near-linear scalability indicates great potential for even bigger problem sizes.
ER  -

APA


Hu, Z., Qirong, H., Dubey, A. & Xing, E.. (2015). Large-scale Distributed Dependent Nonparametric Trees. Proceedings of the 32nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 37:1651-1659 Available from https://proceedings.mlr.press/v37/hu15.html.

Large-scale Distributed Dependent Nonparametric Trees

Abstract

Cite this Paper

Related Material