Universality of Bayesian mixture predictors

Daniil Ryabko
Proceedings of the 28th International Conference on Algorithmic Learning Theory, PMLR 76:57-71, 2017.

Abstract

The problem is that of sequential probability forecasting for discrete-valued time series. The data is generated by an unknown probability distribution over the space of all one-way infinite sequences. It is known that this measure belongs to a given set $\mathcal{C}$, but the latter is completely arbitrary (uncountably infinite, without any structure given). The performance is measured by asymptotic average log loss. In this work it is shown that the minimax asymptotic performance is always attainable, and it is attained by a Bayesian mixture over countably many measures from the set $\mathcal{C}$. This was previously only known for the case when the best achievable asymptotic error is 0. The new result can be interpreted as a complete-class theorem for prediction. It also contrasts previous results that show that in the non-realizable case all Bayesian mixtures may be suboptimal. This leads to a very general conclusion concerning model selection for a problem of sequential inference: it is better to take a model large enough to make sure it includes the process that generates the data, even if it entails positive asymptotic average loss, for otherwise any combination of predictors in the model class may be useless.

Cite this Paper


BibTeX
@InProceedings{pmlr-v76-ryabko17a, title = {Universality of Bayesian mixture predictors}, author = {Ryabko, Daniil}, booktitle = {Proceedings of the 28th International Conference on Algorithmic Learning Theory}, pages = {57--71}, year = {2017}, editor = {Hanneke, Steve and Reyzin, Lev}, volume = {76}, series = {Proceedings of Machine Learning Research}, month = {15--17 Oct}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v76/ryabko17a/ryabko17a.pdf}, url = {https://proceedings.mlr.press/v76/ryabko17a.html}, abstract = {The problem is that of sequential probability forecasting for discrete-valued time series. The data is generated by an unknown probability distribution over the space of all one-way infinite sequences. It is known that this measure belongs to a given set $\mathcal{C}$, but the latter is completely arbitrary (uncountably infinite, without any structure given). The performance is measured by asymptotic average log loss. In this work it is shown that the minimax asymptotic performance is always attainable, and it is attained by a Bayesian mixture over countably many measures from the set $\mathcal{C}$. This was previously only known for the case when the best achievable asymptotic error is 0. The new result can be interpreted as a complete-class theorem for prediction. It also contrasts previous results that show that in the non-realizable case all Bayesian mixtures may be suboptimal. This leads to a very general conclusion concerning model selection for a problem of sequential inference: it is better to take a model large enough to make sure it includes the process that generates the data, even if it entails positive asymptotic average loss, for otherwise any combination of predictors in the model class may be useless.} }
Endnote
%0 Conference Paper %T Universality of Bayesian mixture predictors %A Daniil Ryabko %B Proceedings of the 28th International Conference on Algorithmic Learning Theory %C Proceedings of Machine Learning Research %D 2017 %E Steve Hanneke %E Lev Reyzin %F pmlr-v76-ryabko17a %I PMLR %P 57--71 %U https://proceedings.mlr.press/v76/ryabko17a.html %V 76 %X The problem is that of sequential probability forecasting for discrete-valued time series. The data is generated by an unknown probability distribution over the space of all one-way infinite sequences. It is known that this measure belongs to a given set $\mathcal{C}$, but the latter is completely arbitrary (uncountably infinite, without any structure given). The performance is measured by asymptotic average log loss. In this work it is shown that the minimax asymptotic performance is always attainable, and it is attained by a Bayesian mixture over countably many measures from the set $\mathcal{C}$. This was previously only known for the case when the best achievable asymptotic error is 0. The new result can be interpreted as a complete-class theorem for prediction. It also contrasts previous results that show that in the non-realizable case all Bayesian mixtures may be suboptimal. This leads to a very general conclusion concerning model selection for a problem of sequential inference: it is better to take a model large enough to make sure it includes the process that generates the data, even if it entails positive asymptotic average loss, for otherwise any combination of predictors in the model class may be useless.
APA
Ryabko, D.. (2017). Universality of Bayesian mixture predictors. Proceedings of the 28th International Conference on Algorithmic Learning Theory, in Proceedings of Machine Learning Research 76:57-71 Available from https://proceedings.mlr.press/v76/ryabko17a.html.

Related Material