The Optimality of Jeffreys Prior for Online Density Estimation and the Asymptotic Normality of Maximum Likelihood Estimators

Fares Hedayati; Peter L. Bartlett

The Optimality of Jeffreys Prior for Online Density Estimation and the Asymptotic Normality of Maximum Likelihood Estimators

Fares Hedayati, Peter L. Bartlett

Proceedings of the 25th Annual Conference on Learning Theory, PMLR 23:7.1-7.13, 2012.

Abstract

We study online learning under logarithmic loss with regular parametric models. We show that a Bayesian strategy predicts optimally only if it uses Jeffreys prior. This result was known for canonical exponential families; we extend it to parametric models for which the maximum likelihood estimator is asymptotically normal. The optimal prediction strategy, normalized maximum likelihood, depends on the number \emphn of rounds of the game, in general. However, when a Bayesian strategy is optimal, normalized maximum likelihood becomes independent of \emphn. Our proof uses this to exploit the asymptotics of normalized maximum likelihood. The asymptotic normality of the maximum likelihood estimator is responsible for the necessity of Jeffreys prior.

Cite this Paper

BibTeX


@InProceedings{pmlr-v23-hedayati12,
  title = 	 {The Optimality of Jeffreys Prior for Online Density Estimation and the Asymptotic Normality of Maximum Likelihood Estimators},
  author = 	 {Hedayati, Fares and Bartlett, Peter L.},
  booktitle = 	 {Proceedings of the 25th Annual Conference on Learning Theory},
  pages = 	 {7.1--7.13},
  year = 	 {2012},
  editor = 	 {Mannor, Shie and Srebro, Nathan and Williamson, Robert C.},
  volume = 	 {23},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {Edinburgh, Scotland},
  month = 	 {25--27 Jun},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v23/hedayati12/hedayati12.pdf},
  url = 	 {https://proceedings.mlr.press/v23/hedayati12.html},
  abstract = 	 {We study online learning under logarithmic loss with regular parametric models. We show that a Bayesian strategy predicts optimally only if it uses Jeffreys prior. This result was known for canonical exponential families; we extend it to parametric models for which the maximum likelihood estimator is asymptotically normal. The optimal prediction strategy, normalized maximum likelihood, depends on the number \emphn of rounds of the game, in general. However, when a Bayesian strategy is optimal, normalized maximum likelihood becomes independent of \emphn. Our proof uses this to exploit the asymptotics of normalized maximum likelihood. The asymptotic normality of the maximum likelihood estimator is responsible for the necessity of Jeffreys prior.}
}

Endnote

%0 Conference Paper
%T The Optimality of Jeffreys Prior for Online Density Estimation and the Asymptotic Normality of Maximum Likelihood Estimators
%A Fares Hedayati
%A Peter L. Bartlett
%B Proceedings of the 25th Annual Conference on Learning Theory
%C Proceedings of Machine Learning Research
%D 2012
%E Shie Mannor
%E Nathan Srebro
%E Robert C. Williamson	
%F pmlr-v23-hedayati12
%I PMLR
%P 7.1--7.13
%U https://proceedings.mlr.press/v23/hedayati12.html
%V 23
%X We study online learning under logarithmic loss with regular parametric models. We show that a Bayesian strategy predicts optimally only if it uses Jeffreys prior. This result was known for canonical exponential families; we extend it to parametric models for which the maximum likelihood estimator is asymptotically normal. The optimal prediction strategy, normalized maximum likelihood, depends on the number \emphn of rounds of the game, in general. However, when a Bayesian strategy is optimal, normalized maximum likelihood becomes independent of \emphn. Our proof uses this to exploit the asymptotics of normalized maximum likelihood. The asymptotic normality of the maximum likelihood estimator is responsible for the necessity of Jeffreys prior.

RIS


TY  - CPAPER
TI  - The Optimality of Jeffreys Prior for Online Density Estimation and the Asymptotic Normality of Maximum Likelihood Estimators
AU  - Fares Hedayati
AU  - Peter L. Bartlett
BT  - Proceedings of the 25th Annual Conference on Learning Theory
DA  - 2012/06/16
ED  - Shie Mannor
ED  - Nathan Srebro
ED  - Robert C. Williamson	
ID  - pmlr-v23-hedayati12
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 23
SP  - 7.1
EP  - 7.13
L1  - http://proceedings.mlr.press/v23/hedayati12/hedayati12.pdf
UR  - https://proceedings.mlr.press/v23/hedayati12.html
AB  - We study online learning under logarithmic loss with regular parametric models. We show that a Bayesian strategy predicts optimally only if it uses Jeffreys prior. This result was known for canonical exponential families; we extend it to parametric models for which the maximum likelihood estimator is asymptotically normal. The optimal prediction strategy, normalized maximum likelihood, depends on the number \emphn of rounds of the game, in general. However, when a Bayesian strategy is optimal, normalized maximum likelihood becomes independent of \emphn. Our proof uses this to exploit the asymptotics of normalized maximum likelihood. The asymptotic normality of the maximum likelihood estimator is responsible for the necessity of Jeffreys prior.
ER  -

APA


Hedayati, F. & Bartlett, P.L.. (2012). The Optimality of Jeffreys Prior for Online Density Estimation and the Asymptotic Normality of Maximum Likelihood Estimators. Proceedings of the 25th Annual Conference on Learning Theory, in Proceedings of Machine Learning Research 23:7.1-7.13 Available from https://proceedings.mlr.press/v23/hedayati12.html.

Related Material

Download PDF