Pitfalls in the use of Parallel Inference for the Dirichlet Process

Yarin Gal; Zoubin Ghahramani

Pitfalls in the use of Parallel Inference for the Dirichlet Process

Yarin Gal, Zoubin Ghahramani

Proceedings of the 31st International Conference on Machine Learning, PMLR 32(2):208-216, 2014.

Abstract

Recent work done by Lovell, Adams, and Mansingka (2012) and Williamson, Dubey, and Xing (2013) has suggested an alternative parametrisation for the Dirichlet process in order to derive non-approximate parallel MCMC inference for it - work which has been picked-up and implemented in several different fields. In this paper we show that the approach suggested is impractical due to an extremely unbalanced distribution of the data. We characterise the requirements of efficient parallel inference for the Dirichlet process and show that the proposed inference fails most of these requirements (while approximate approaches often satisfy most of them). We present both theoretical and experimental evidence, analysing the load balance for the inference and showing that it is independent of the size of the dataset and the number of nodes available in the parallel implementation. We end with suggestions of alternative paths of research for efficient non-approximate parallel inference for the Dirichlet process.

Cite this Paper

BibTeX


@InProceedings{pmlr-v32-gal14,
  title = 	 {Pitfalls in the use of Parallel Inference for the Dirichlet Process},
  author = 	 {Gal, Yarin and Ghahramani, Zoubin},
  booktitle = 	 {Proceedings of the 31st International Conference on Machine Learning},
  pages = 	 {208--216},
  year = 	 {2014},
  editor = 	 {Xing, Eric P. and Jebara, Tony},
  volume = 	 {32},
  number =       {2},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {Bejing, China},
  month = 	 {22--24 Jun},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v32/gal14.pdf},
  url = 	 {https://proceedings.mlr.press/v32/gal14.html},
  abstract = 	 {Recent work done by Lovell, Adams, and Mansingka (2012) and Williamson, Dubey, and Xing (2013) has suggested an alternative parametrisation for the Dirichlet process in order to derive non-approximate parallel MCMC inference for it - work which has been picked-up and implemented in several different fields. In this paper we show that the approach suggested is impractical due to an extremely unbalanced distribution of the data. We characterise the requirements of efficient parallel inference for the Dirichlet process and show that the proposed inference fails most of these requirements (while approximate approaches often satisfy most of them). We present both theoretical and experimental evidence, analysing the load balance for the inference and showing that it is independent of the size of the dataset and the number of nodes available in the parallel implementation. We end with suggestions of alternative paths of research for efficient non-approximate parallel inference for the Dirichlet process.}
}

Endnote

%0 Conference Paper
%T Pitfalls in the use of Parallel Inference for the Dirichlet Process
%A Yarin Gal
%A Zoubin Ghahramani
%B Proceedings of the 31st International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2014
%E Eric P. Xing
%E Tony Jebara	
%F pmlr-v32-gal14
%I PMLR
%P 208--216
%U https://proceedings.mlr.press/v32/gal14.html
%V 32
%N 2
%X Recent work done by Lovell, Adams, and Mansingka (2012) and Williamson, Dubey, and Xing (2013) has suggested an alternative parametrisation for the Dirichlet process in order to derive non-approximate parallel MCMC inference for it - work which has been picked-up and implemented in several different fields. In this paper we show that the approach suggested is impractical due to an extremely unbalanced distribution of the data. We characterise the requirements of efficient parallel inference for the Dirichlet process and show that the proposed inference fails most of these requirements (while approximate approaches often satisfy most of them). We present both theoretical and experimental evidence, analysing the load balance for the inference and showing that it is independent of the size of the dataset and the number of nodes available in the parallel implementation. We end with suggestions of alternative paths of research for efficient non-approximate parallel inference for the Dirichlet process.

RIS


TY  - CPAPER
TI  - Pitfalls in the use of Parallel Inference for the Dirichlet Process
AU  - Yarin Gal
AU  - Zoubin Ghahramani
BT  - Proceedings of the 31st International Conference on Machine Learning
DA  - 2014/06/18
ED  - Eric P. Xing
ED  - Tony Jebara	
ID  - pmlr-v32-gal14
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 32
IS  - 2
SP  - 208
EP  - 216
L1  - http://proceedings.mlr.press/v32/gal14.pdf
UR  - https://proceedings.mlr.press/v32/gal14.html
AB  - Recent work done by Lovell, Adams, and Mansingka (2012) and Williamson, Dubey, and Xing (2013) has suggested an alternative parametrisation for the Dirichlet process in order to derive non-approximate parallel MCMC inference for it - work which has been picked-up and implemented in several different fields. In this paper we show that the approach suggested is impractical due to an extremely unbalanced distribution of the data. We characterise the requirements of efficient parallel inference for the Dirichlet process and show that the proposed inference fails most of these requirements (while approximate approaches often satisfy most of them). We present both theoretical and experimental evidence, analysing the load balance for the inference and showing that it is independent of the size of the dataset and the number of nodes available in the parallel implementation. We end with suggestions of alternative paths of research for efficient non-approximate parallel inference for the Dirichlet process.
ER  -

APA


Gal, Y. & Ghahramani, Z.. (2014). Pitfalls in the use of Parallel Inference for the Dirichlet Process. Proceedings of the 31st International Conference on Machine Learning, in Proceedings of Machine Learning Research 32(2):208-216 Available from https://proceedings.mlr.press/v32/gal14.html.

Related Material

Download PDF