Minimax Optimal Bayes Mixtures for Memoryless Sources over Large Alphabets

Elias Jääsaari, Janne Leppä-aho, Tomi Silander, Teemu Roos
Proceedings of Algorithmic Learning Theory, PMLR 83:470-488, 2018.

Abstract

The normalized maximum likelihood (NML) distribution achieves minimax log loss and coding regret for the multinomial model. In practice other nearly minimax distributions are used instead as calculating the sequential probabilities needed for coding and prediction takes exponential time with NML. The Bayes mixture obtained with the Dirichlet prior $\operatorname{Dir}(1/2, …, 1/2)$ and asymptotically minimax modifications of it have been widely studied in the context of large sample sizes. Recently there has also been interest in minimax optimal coding distributions for large alphabets. We investigate Dirichlet priors that achieve minimax coding regret when the alphabet size $m$ is finite but large in comparison to the sample size $n$. We prove that a Bayes mixture with the Dirichlet prior $\operatorname{Dir}(1/3, …, 1/3)$ is optimal in this regime (in particular, when $m > \frac{5}{2} n + \frac{4}{n - 2} + \frac{3}{2}$). The worst-case regret of the resulting distribution approaches the NML regret as the alphabet size grows.

Cite this Paper


BibTeX
@InProceedings{pmlr-v83-jaasaari18a, title = {Minimax Optimal Bayes Mixtures for Memoryless Sources over Large Alphabets}, author = {Jääsaari, Elias and Leppä-aho, Janne and Silander, Tomi and Roos, Teemu}, booktitle = {Proceedings of Algorithmic Learning Theory}, pages = {470--488}, year = {2018}, editor = {Janoos, Firdaus and Mohri, Mehryar and Sridharan, Karthik}, volume = {83}, series = {Proceedings of Machine Learning Research}, month = {07--09 Apr}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v83/jaasaari18a/jaasaari18a.pdf}, url = {https://proceedings.mlr.press/v83/jaasaari18a.html}, abstract = {The normalized maximum likelihood (NML) distribution achieves minimax log loss and coding regret for the multinomial model. In practice other nearly minimax distributions are used instead as calculating the sequential probabilities needed for coding and prediction takes exponential time with NML. The Bayes mixture obtained with the Dirichlet prior $\operatorname{Dir}(1/2, …, 1/2)$ and asymptotically minimax modifications of it have been widely studied in the context of large sample sizes. Recently there has also been interest in minimax optimal coding distributions for large alphabets. We investigate Dirichlet priors that achieve minimax coding regret when the alphabet size $m$ is finite but large in comparison to the sample size $n$. We prove that a Bayes mixture with the Dirichlet prior $\operatorname{Dir}(1/3, …, 1/3)$ is optimal in this regime (in particular, when $m > \frac{5}{2} n + \frac{4}{n - 2} + \frac{3}{2}$). The worst-case regret of the resulting distribution approaches the NML regret as the alphabet size grows.} }
Endnote
%0 Conference Paper %T Minimax Optimal Bayes Mixtures for Memoryless Sources over Large Alphabets %A Elias Jääsaari %A Janne Leppä-aho %A Tomi Silander %A Teemu Roos %B Proceedings of Algorithmic Learning Theory %C Proceedings of Machine Learning Research %D 2018 %E Firdaus Janoos %E Mehryar Mohri %E Karthik Sridharan %F pmlr-v83-jaasaari18a %I PMLR %P 470--488 %U https://proceedings.mlr.press/v83/jaasaari18a.html %V 83 %X The normalized maximum likelihood (NML) distribution achieves minimax log loss and coding regret for the multinomial model. In practice other nearly minimax distributions are used instead as calculating the sequential probabilities needed for coding and prediction takes exponential time with NML. The Bayes mixture obtained with the Dirichlet prior $\operatorname{Dir}(1/2, …, 1/2)$ and asymptotically minimax modifications of it have been widely studied in the context of large sample sizes. Recently there has also been interest in minimax optimal coding distributions for large alphabets. We investigate Dirichlet priors that achieve minimax coding regret when the alphabet size $m$ is finite but large in comparison to the sample size $n$. We prove that a Bayes mixture with the Dirichlet prior $\operatorname{Dir}(1/3, …, 1/3)$ is optimal in this regime (in particular, when $m > \frac{5}{2} n + \frac{4}{n - 2} + \frac{3}{2}$). The worst-case regret of the resulting distribution approaches the NML regret as the alphabet size grows.
APA
Jääsaari, E., Leppä-aho, J., Silander, T. & Roos, T.. (2018). Minimax Optimal Bayes Mixtures for Memoryless Sources over Large Alphabets. Proceedings of Algorithmic Learning Theory, in Proceedings of Machine Learning Research 83:470-488 Available from https://proceedings.mlr.press/v83/jaasaari18a.html.

Related Material