[edit]
Prediction from compression for models with infinite memory, with applications to hidden Markov and renewal processes
Proceedings of Thirty Seventh Conference on Learning Theory, PMLR 247:2270-2307, 2024.
Abstract
Consider the problem of predicting the next symbol given a sample path of length $n$, whose joint distribution belongs to a distribution class that may have long-term memory. The goal is to compete with the conditional predictor that knows the true model. For both hidden Markov models (HMMs) and renewal processes, we determine the optimal prediction risk in Kullback-Leibler divergence up to universal constant factors. Extending existing results in finite-order Markov models (Han et al. (2023)) and drawing ideas from universal compression, the proposed estimator has a prediction risk bounded by redundancy of the distribution class and a memory term that accounts for the long-range dependency of the model. Notably, for HMMs with bounded state and observation spaces, a polynomial-time estimator based on dynamic programming is shown to achieve the optimal prediction risk $\Theta(\frac{\log n}{n})$; prior to this work, the only known result of this type is $O(\frac{1}{\log n})$ obtained using Markov approximation (Sharan et al. (2018)). Matching minimax lower bounds are obtained by making connections to redundancy and mutual information via a reduction argument.