Learning multivariate Gaussians with imperfect advice

Arnab Bhattacharyya, Davin Choo, Philips George John, Themis Gouleakis
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:4210-4252, 2025.

Abstract

We revisit the problem of distribution learning within the framework of learning-augmented algorithms. In this setting, we explore the scenario where a probability distribution is provided as potentially inaccurate advice on the true, unknown distribution. Our objective is to develop learning algorithms whose sample complexity decreases as the quality of the advice improves, thereby surpassing standard learning lower bounds when the advice is sufficiently accurate. Specifically, we demonstrate that this outcome is achievable for the problem of learning a multivariate Gaussian distribution $N(\mu, \Sigma)$ in the PAC learning setting. Classically, in the advice-free setting, $\widetilde{\Theta}(d^2/\varepsilon^2)$ samples are sufficient and worst case necessary to learn $d$-dimensional Gaussians up to TV distance $\varepsilon$ with constant probability. When we are additionally given a parameter $\widetilde{\Sigma}$ as advice, we show that $\widetilde{\mathcal{O}}(d^{2-\beta}/\varepsilon^2)$ samples suffices whenever $|| \widetilde{\Sigma}^{-1/2} \Sigma \widetilde{\Sigma}^{-1/2} - I_d ||_1 \leq \varepsilon d^{1-\beta}$ (where $||\cdot||_1$ denotes the entrywise $\ell_1$ norm) for any $\beta > 0$, yielding a polynomial improvement over the advice-free setting.

Cite this Paper


BibTeX
@InProceedings{pmlr-v267-bhattacharyya25a, title = {Learning multivariate {G}aussians with imperfect advice}, author = {Bhattacharyya, Arnab and Choo, Davin and George John, Philips and Gouleakis, Themis}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, pages = {4210--4252}, year = {2025}, editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry}, volume = {267}, series = {Proceedings of Machine Learning Research}, month = {13--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/bhattacharyya25a/bhattacharyya25a.pdf}, url = {https://proceedings.mlr.press/v267/bhattacharyya25a.html}, abstract = {We revisit the problem of distribution learning within the framework of learning-augmented algorithms. In this setting, we explore the scenario where a probability distribution is provided as potentially inaccurate advice on the true, unknown distribution. Our objective is to develop learning algorithms whose sample complexity decreases as the quality of the advice improves, thereby surpassing standard learning lower bounds when the advice is sufficiently accurate. Specifically, we demonstrate that this outcome is achievable for the problem of learning a multivariate Gaussian distribution $N(\mu, \Sigma)$ in the PAC learning setting. Classically, in the advice-free setting, $\widetilde{\Theta}(d^2/\varepsilon^2)$ samples are sufficient and worst case necessary to learn $d$-dimensional Gaussians up to TV distance $\varepsilon$ with constant probability. When we are additionally given a parameter $\widetilde{\Sigma}$ as advice, we show that $\widetilde{\mathcal{O}}(d^{2-\beta}/\varepsilon^2)$ samples suffices whenever $|| \widetilde{\Sigma}^{-1/2} \Sigma \widetilde{\Sigma}^{-1/2} - I_d ||_1 \leq \varepsilon d^{1-\beta}$ (where $||\cdot||_1$ denotes the entrywise $\ell_1$ norm) for any $\beta > 0$, yielding a polynomial improvement over the advice-free setting.} }
Endnote
%0 Conference Paper %T Learning multivariate Gaussians with imperfect advice %A Arnab Bhattacharyya %A Davin Choo %A Philips George John %A Themis Gouleakis %B Proceedings of the 42nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Aarti Singh %E Maryam Fazel %E Daniel Hsu %E Simon Lacoste-Julien %E Felix Berkenkamp %E Tegan Maharaj %E Kiri Wagstaff %E Jerry Zhu %F pmlr-v267-bhattacharyya25a %I PMLR %P 4210--4252 %U https://proceedings.mlr.press/v267/bhattacharyya25a.html %V 267 %X We revisit the problem of distribution learning within the framework of learning-augmented algorithms. In this setting, we explore the scenario where a probability distribution is provided as potentially inaccurate advice on the true, unknown distribution. Our objective is to develop learning algorithms whose sample complexity decreases as the quality of the advice improves, thereby surpassing standard learning lower bounds when the advice is sufficiently accurate. Specifically, we demonstrate that this outcome is achievable for the problem of learning a multivariate Gaussian distribution $N(\mu, \Sigma)$ in the PAC learning setting. Classically, in the advice-free setting, $\widetilde{\Theta}(d^2/\varepsilon^2)$ samples are sufficient and worst case necessary to learn $d$-dimensional Gaussians up to TV distance $\varepsilon$ with constant probability. When we are additionally given a parameter $\widetilde{\Sigma}$ as advice, we show that $\widetilde{\mathcal{O}}(d^{2-\beta}/\varepsilon^2)$ samples suffices whenever $|| \widetilde{\Sigma}^{-1/2} \Sigma \widetilde{\Sigma}^{-1/2} - I_d ||_1 \leq \varepsilon d^{1-\beta}$ (where $||\cdot||_1$ denotes the entrywise $\ell_1$ norm) for any $\beta > 0$, yielding a polynomial improvement over the advice-free setting.
APA
Bhattacharyya, A., Choo, D., George John, P. & Gouleakis, T.. (2025). Learning multivariate Gaussians with imperfect advice. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:4210-4252 Available from https://proceedings.mlr.press/v267/bhattacharyya25a.html.

Related Material