Universality of Bayesian mixture predictors

Daniil Ryabko

Universality of Bayesian mixture predictors

Daniil Ryabko

Proceedings of the 28th International Conference on Algorithmic Learning Theory, PMLR 76:57-71, 2017.

Abstract

The problem is that of sequential probability forecasting for discrete-valued time series. The data is generated by an unknown probability distribution over the space of all one-way infinite sequences. It is known that this measure belongs to a given set $\mathcal{C}$, but the latter is completely arbitrary (uncountably infinite, without any structure given). The performance is measured by asymptotic average log loss. In this work it is shown that the minimax asymptotic performance is always attainable, and it is attained by a Bayesian mixture over countably many measures from the set $\mathcal{C}$. This was previously only known for the case when the best achievable asymptotic error is 0. The new result can be interpreted as a complete-class theorem for prediction. It also contrasts previous results that show that in the non-realizable case all Bayesian mixtures may be suboptimal. This leads to a very general conclusion concerning model selection for a problem of sequential inference: it is better to take a model large enough to make sure it includes the process that generates the data, even if it entails positive asymptotic average loss, for otherwise any combination of predictors in the model class may be useless.

Cite this Paper

BibTeX


@InProceedings{pmlr-v76-ryabko17a,
  title = 	 {Universality of Bayesian mixture predictors},
  author = 	 {Ryabko, Daniil},
  booktitle = 	 {Proceedings of the 28th International Conference on Algorithmic Learning Theory},
  pages = 	 {57--71},
  year = 	 {2017},
  editor = 	 {Hanneke, Steve and Reyzin, Lev},
  volume = 	 {76},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {15--17 Oct},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v76/ryabko17a/ryabko17a.pdf},
  url = 	 {https://proceedings.mlr.press/v76/ryabko17a.html},
  abstract = 	 {The problem is that of sequential probability forecasting for discrete-valued time series. The data is generated by an unknown  probability distribution over the space of all one-way infinite sequences. It is known that this measure belongs to a given set $\mathcal{C}$, but the latter is completely arbitrary (uncountably infinite, without any structure given).  The performance is measured by  asymptotic average log loss.  In this work it is shown that the minimax asymptotic performance  is always attainable, and it is attained by a Bayesian mixture over countably many measures from the set $\mathcal{C}$. This was previously only known for the case when the best achievable asymptotic error is 0. The new result can  be interpreted  as a complete-class theorem for prediction. It also contrasts previous results that show that in the non-realizable case all Bayesian mixtures may be suboptimal. This leads to a very general conclusion concerning model selection for a problem of sequential inference: it is better to take a model large enough to make sure it includes the process that generates the data, even if it entails positive asymptotic average loss, for otherwise any combination of predictors in the model class may be useless.}
}

Endnote

%0 Conference Paper
%T Universality of Bayesian mixture predictors
%A Daniil Ryabko
%B Proceedings of the 28th International Conference on Algorithmic Learning Theory
%C Proceedings of Machine Learning Research
%D 2017
%E Steve Hanneke
%E Lev Reyzin	
%F pmlr-v76-ryabko17a
%I PMLR
%P 57--71
%U https://proceedings.mlr.press/v76/ryabko17a.html
%V 76
%X The problem is that of sequential probability forecasting for discrete-valued time series. The data is generated by an unknown  probability distribution over the space of all one-way infinite sequences. It is known that this measure belongs to a given set $\mathcal{C}$, but the latter is completely arbitrary (uncountably infinite, without any structure given).  The performance is measured by  asymptotic average log loss.  In this work it is shown that the minimax asymptotic performance  is always attainable, and it is attained by a Bayesian mixture over countably many measures from the set $\mathcal{C}$. This was previously only known for the case when the best achievable asymptotic error is 0. The new result can  be interpreted  as a complete-class theorem for prediction. It also contrasts previous results that show that in the non-realizable case all Bayesian mixtures may be suboptimal. This leads to a very general conclusion concerning model selection for a problem of sequential inference: it is better to take a model large enough to make sure it includes the process that generates the data, even if it entails positive asymptotic average loss, for otherwise any combination of predictors in the model class may be useless.

APA


Ryabko, D.. (2017). Universality of Bayesian mixture predictors. Proceedings of the 28th International Conference on Algorithmic Learning Theory, in Proceedings of Machine Learning Research 76:57-71 Available from https://proceedings.mlr.press/v76/ryabko17a.html.

Related Material

Download PDF