Sparse Adaptive Dirichlet-Multinomial-like Processes

Marcus Hutter

Sparse Adaptive Dirichlet-Multinomial-like Processes

Marcus Hutter

Proceedings of the 26th Annual Conference on Learning Theory, PMLR 30:432-459, 2013.

Abstract

Online estimation and modelling of i.i.d. data for shortsequences over large or complex “alphabets” is a ubiquitous (sub)problem in machine learning, information theory, data compression, statistical language processing, and document analysis. The Dirichlet-Multinomial distribution (also called Polya urn scheme) and extensions thereof are widely applied for online i.i.d. estimation. Good a-priori choices for the parameters in this regime are difficult to obtain though. I derive an optimal adaptive choice for the main parameter via tight, data-dependent redundancy bounds for a related model. The 1-line recommendation is to set the ’total mass’ = ’precision’ = ’concentration’ parameter to m/[2\ln\fracn+1m], where n is the (past) sample size and m the number of different symbols observed (so far). The resulting estimator is simple, online, fast,and experimental performance is superb.

Cite this Paper

BibTeX


@InProceedings{pmlr-v30-Hutter13,
  title = 	 {Sparse Adaptive Dirichlet-Multinomial-like Processes},
  author = 	 {Hutter, Marcus},
  booktitle = 	 {Proceedings of the 26th Annual Conference on Learning Theory},
  pages = 	 {432--459},
  year = 	 {2013},
  editor = 	 {Shalev-Shwartz, Shai and Steinwart, Ingo},
  volume = 	 {30},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {Princeton, NJ, USA},
  month = 	 {12--14 Jun},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v30/Hutter13.pdf},
  url = 	 {https://proceedings.mlr.press/v30/Hutter13.html},
  abstract = 	 {Online estimation and modelling of i.i.d. data for shortsequences over large or complex “alphabets” is a ubiquitous (sub)problem in machine learning, information theory, data compression, statistical language processing, and document analysis. The Dirichlet-Multinomial distribution (also called Polya urn scheme) and extensions thereof are widely applied for online i.i.d. estimation. Good a-priori choices for the parameters in this regime are difficult to obtain though. I derive an optimal adaptive choice for the main parameter via tight, data-dependent redundancy bounds for a related model. The 1-line recommendation is to set the ’total mass’ = ’precision’ = ’concentration’ parameter to m/[2\ln\fracn+1m], where n is the (past) sample size and m the number of different symbols observed (so far). The resulting estimator is simple, online, fast,and experimental performance is superb.}
}

Endnote

%0 Conference Paper
%T Sparse Adaptive Dirichlet-Multinomial-like Processes
%A Marcus Hutter
%B Proceedings of the 26th Annual Conference on Learning Theory
%C Proceedings of Machine Learning Research
%D 2013
%E Shai Shalev-Shwartz
%E Ingo Steinwart	
%F pmlr-v30-Hutter13
%I PMLR
%P 432--459
%U https://proceedings.mlr.press/v30/Hutter13.html
%V 30
%X Online estimation and modelling of i.i.d. data for shortsequences over large or complex “alphabets” is a ubiquitous (sub)problem in machine learning, information theory, data compression, statistical language processing, and document analysis. The Dirichlet-Multinomial distribution (also called Polya urn scheme) and extensions thereof are widely applied for online i.i.d. estimation. Good a-priori choices for the parameters in this regime are difficult to obtain though. I derive an optimal adaptive choice for the main parameter via tight, data-dependent redundancy bounds for a related model. The 1-line recommendation is to set the ’total mass’ = ’precision’ = ’concentration’ parameter to m/[2\ln\fracn+1m], where n is the (past) sample size and m the number of different symbols observed (so far). The resulting estimator is simple, online, fast,and experimental performance is superb.

RIS


TY  - CPAPER
TI  - Sparse Adaptive Dirichlet-Multinomial-like Processes
AU  - Marcus Hutter
BT  - Proceedings of the 26th Annual Conference on Learning Theory
DA  - 2013/06/13
ED  - Shai Shalev-Shwartz
ED  - Ingo Steinwart	
ID  - pmlr-v30-Hutter13
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 30
SP  - 432
EP  - 459
L1  - http://proceedings.mlr.press/v30/Hutter13.pdf
UR  - https://proceedings.mlr.press/v30/Hutter13.html
AB  - Online estimation and modelling of i.i.d. data for shortsequences over large or complex “alphabets” is a ubiquitous (sub)problem in machine learning, information theory, data compression, statistical language processing, and document analysis. The Dirichlet-Multinomial distribution (also called Polya urn scheme) and extensions thereof are widely applied for online i.i.d. estimation. Good a-priori choices for the parameters in this regime are difficult to obtain though. I derive an optimal adaptive choice for the main parameter via tight, data-dependent redundancy bounds for a related model. The 1-line recommendation is to set the ’total mass’ = ’precision’ = ’concentration’ parameter to m/[2\ln\fracn+1m], where n is the (past) sample size and m the number of different symbols observed (so far). The resulting estimator is simple, online, fast,and experimental performance is superb.
ER  -

APA


Hutter, M.. (2013). Sparse Adaptive Dirichlet-Multinomial-like Processes. Proceedings of the 26th Annual Conference on Learning Theory, in Proceedings of Machine Learning Research 30:432-459 Available from https://proceedings.mlr.press/v30/Hutter13.html.

Related Material

Download PDF