Entropic characterization of optimal rates for learning Gaussian mixtures

Zeyu Jia, Yury Polyanskiy, Yihong Wu
Proceedings of Thirty Sixth Conference on Learning Theory, PMLR 195:4296-4335, 2023.

Abstract

We consider the question of estimating multi-dimensional Gaussian mixtures (GM) with com- pactly supported or subgaussian mixing distributions. Minimax estimation rate for this class (under Hellinger, TV and KL divergences) is a long-standing open question, even for dimension one. In this paper we characterize this rate (in all dimensions) in terms of the metric entropy of the class. Such characterizations originate from seminal works of Le Cam (1973); Birge ́ (1983); Haussler and Opper (1997); Yang and Barron (1999). However, for GMs a key ingredient missing from earlier work (and widely sought-after) is a comparison result showing that the KL and the squared Hellinger distance are within a constant multiple of each other uniformly over the class. Our main technical contribution is in showing this fact, from which we derive entropy characterization for estimation rate under Hellinger and KL. Interestingly, the sequential (online learning) estimation rate is characterized by the global entropy, while the single-step (batch) rate corresponds to local entropy, paralleling a similar recent discovery for the case of Gaussian sequence model in a pair of works Neykov (2022); Mourtada (2023). Additionally, since Hellinger is a proper metric, our comparison shows that GMs under KL satisfy a version of triangle inequality (with a multiplicative constant), implying that proper and improper estimation rates coincide.

Cite this Paper


BibTeX
@InProceedings{pmlr-v195-jia23a, title = {Entropic characterization of optimal rates for learning Gaussian mixtures}, author = {Jia, Zeyu and Polyanskiy, Yury and Wu, Yihong}, booktitle = {Proceedings of Thirty Sixth Conference on Learning Theory}, pages = {4296--4335}, year = {2023}, editor = {Neu, Gergely and Rosasco, Lorenzo}, volume = {195}, series = {Proceedings of Machine Learning Research}, month = {12--15 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v195/jia23a/jia23a.pdf}, url = {https://proceedings.mlr.press/v195/jia23a.html}, abstract = {We consider the question of estimating multi-dimensional Gaussian mixtures (GM) with com- pactly supported or subgaussian mixing distributions. Minimax estimation rate for this class (under Hellinger, TV and KL divergences) is a long-standing open question, even for dimension one. In this paper we characterize this rate (in all dimensions) in terms of the metric entropy of the class. Such characterizations originate from seminal works of Le Cam (1973); Birge ́ (1983); Haussler and Opper (1997); Yang and Barron (1999). However, for GMs a key ingredient missing from earlier work (and widely sought-after) is a comparison result showing that the KL and the squared Hellinger distance are within a constant multiple of each other uniformly over the class. Our main technical contribution is in showing this fact, from which we derive entropy characterization for estimation rate under Hellinger and KL. Interestingly, the sequential (online learning) estimation rate is characterized by the global entropy, while the single-step (batch) rate corresponds to local entropy, paralleling a similar recent discovery for the case of Gaussian sequence model in a pair of works Neykov (2022); Mourtada (2023). Additionally, since Hellinger is a proper metric, our comparison shows that GMs under KL satisfy a version of triangle inequality (with a multiplicative constant), implying that proper and improper estimation rates coincide.} }
Endnote
%0 Conference Paper %T Entropic characterization of optimal rates for learning Gaussian mixtures %A Zeyu Jia %A Yury Polyanskiy %A Yihong Wu %B Proceedings of Thirty Sixth Conference on Learning Theory %C Proceedings of Machine Learning Research %D 2023 %E Gergely Neu %E Lorenzo Rosasco %F pmlr-v195-jia23a %I PMLR %P 4296--4335 %U https://proceedings.mlr.press/v195/jia23a.html %V 195 %X We consider the question of estimating multi-dimensional Gaussian mixtures (GM) with com- pactly supported or subgaussian mixing distributions. Minimax estimation rate for this class (under Hellinger, TV and KL divergences) is a long-standing open question, even for dimension one. In this paper we characterize this rate (in all dimensions) in terms of the metric entropy of the class. Such characterizations originate from seminal works of Le Cam (1973); Birge ́ (1983); Haussler and Opper (1997); Yang and Barron (1999). However, for GMs a key ingredient missing from earlier work (and widely sought-after) is a comparison result showing that the KL and the squared Hellinger distance are within a constant multiple of each other uniformly over the class. Our main technical contribution is in showing this fact, from which we derive entropy characterization for estimation rate under Hellinger and KL. Interestingly, the sequential (online learning) estimation rate is characterized by the global entropy, while the single-step (batch) rate corresponds to local entropy, paralleling a similar recent discovery for the case of Gaussian sequence model in a pair of works Neykov (2022); Mourtada (2023). Additionally, since Hellinger is a proper metric, our comparison shows that GMs under KL satisfy a version of triangle inequality (with a multiplicative constant), implying that proper and improper estimation rates coincide.
APA
Jia, Z., Polyanskiy, Y. & Wu, Y.. (2023). Entropic characterization of optimal rates for learning Gaussian mixtures. Proceedings of Thirty Sixth Conference on Learning Theory, in Proceedings of Machine Learning Research 195:4296-4335 Available from https://proceedings.mlr.press/v195/jia23a.html.

Related Material