Minimax Optimal Bayes Mixtures for Memoryless Sources over Large Alphabets
[edit]
Proceedings of Algorithmic Learning Theory, PMLR 83:470488, 2018.
Abstract
The normalized maximum likelihood (NML) distribution achieves minimax log loss and coding regret for the multinomial model. In practice other nearly minimax distributions are used instead as calculating the sequential probabilities needed for coding and prediction takes exponential time with NML. The Bayes mixture obtained with the Dirichlet prior $\operatorname{Dir}(1/2, …, 1/2)$ and asymptotically minimax modifications of it have been widely studied in the context of large sample sizes. Recently there has also been interest in minimax optimal coding distributions for large alphabets. We investigate Dirichlet priors that achieve minimax coding regret when the alphabet size $m$ is finite but large in comparison to the sample size $n$. We prove that a Bayes mixture with the Dirichlet prior $\operatorname{Dir}(1/3, …, 1/3)$ is optimal in this regime (in particular, when $m > \frac{5}{2} n + \frac{4}{n  2} + \frac{3}{2}$). The worstcase regret of the resulting distribution approaches the NML regret as the alphabet size grows.
Related Material


