Computing High-dimensional Confidence Sets for Arbitrary Distributions

Chao Gao, Liren Shan, Vaidehi Srinivas, Aravindan Vijayaraghavan
Proceedings of Thirty Eighth Conference on Learning Theory, PMLR 291:2227-2269, 2025.

Abstract

We study the problem of learning a high-density region of an arbitrary distribution over $\mathbb{R}^d$. Given a target coverage parameter $\delta$, and sample access to an arbitrary distribution $\mathcal{D}$, we want to output a confidence set $S \subset \mathbb{R}^d$ such that $S$ achieves $\delta$ coverage of $\mathcal{D}$, i.e., $\mathbb{P}_{y \sim \mathcal{D}} \left[ y \in S \right] \ge \delta$, and the volume of $S$ is as small as possible. This is a central problem in high-dimensional statistics with applications in high-dimensional analogues of finding confidence intervals, uncertainty quantification, and support estimation. In the most general setting, this problem is statistically intractable, so we restrict our attention to competing with sets from a concept class $\mathcal{C}$ with bounded VC-dimension. An algorithm for learning confidence sets is competitive with class $\mathcal{C}$ if, given samples from an arbitrary distribution $\mathcal{D}$, it outputs in polynomial time a set that achieves $\delta$ coverage of $\mathcal{D}$, and whose volume is competitive with the smallest set in $\mathcal{C}$ with the required coverage $\delta$. This problem is computationally challenging even in the basic setting when $\mathcal{C}$ is the set of all Euclidean balls. Existing algorithms based on coresets find in polynomial time a ball whose volume is $\exp(\tilde{O}( d/ \log d))$-factor competitive with the volume of the best ball. Our main result is an algorithm that finds a confidence set whose volume is $\exp(\tilde{O}(d^{1/2}))$ factor competitive with the optimal ball having the desired coverage. It is surprisingly simple and also extends to finding confidence sets competitive against unions of $k$ balls, and improved guarantees under additional assumptions. The algorithm is improper (it outputs an ellipsoid). Combined with our computational intractability result for proper learning balls within an $\exp(\tilde{O}(d^{1-o(1)}))$ approximation factor in volume, our results provide an interesting separation between proper and (improper) learning of confidence sets.

Cite this Paper


BibTeX
@InProceedings{pmlr-v291-gao25b, title = {Computing High-dimensional Confidence Sets for Arbitrary Distributions}, author = {Gao, Chao and Shan, Liren and Srinivas, Vaidehi and Vijayaraghavan, Aravindan}, booktitle = {Proceedings of Thirty Eighth Conference on Learning Theory}, pages = {2227--2269}, year = {2025}, editor = {Haghtalab, Nika and Moitra, Ankur}, volume = {291}, series = {Proceedings of Machine Learning Research}, month = {30 Jun--04 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v291/main/assets/gao25b/gao25b.pdf}, url = {https://proceedings.mlr.press/v291/gao25b.html}, abstract = {We study the problem of learning a high-density region of an arbitrary distribution over $\mathbb{R}^d$. Given a target coverage parameter $\delta$, and sample access to an arbitrary distribution $\mathcal{D}$, we want to output a confidence set $S \subset \mathbb{R}^d$ such that $S$ achieves $\delta$ coverage of $\mathcal{D}$, i.e., $\mathbb{P}_{y \sim \mathcal{D}} \left[ y \in S \right] \ge \delta$, and the volume of $S$ is as small as possible. This is a central problem in high-dimensional statistics with applications in high-dimensional analogues of finding confidence intervals, uncertainty quantification, and support estimation. In the most general setting, this problem is statistically intractable, so we restrict our attention to competing with sets from a concept class $\mathcal{C}$ with bounded VC-dimension. An algorithm for learning confidence sets is competitive with class $\mathcal{C}$ if, given samples from an arbitrary distribution $\mathcal{D}$, it outputs in polynomial time a set that achieves $\delta$ coverage of $\mathcal{D}$, and whose volume is competitive with the smallest set in $\mathcal{C}$ with the required coverage $\delta$. This problem is computationally challenging even in the basic setting when $\mathcal{C}$ is the set of all Euclidean balls. Existing algorithms based on coresets find in polynomial time a ball whose volume is $\exp(\tilde{O}( d/ \log d))$-factor competitive with the volume of the best ball. Our main result is an algorithm that finds a confidence set whose volume is $\exp(\tilde{O}(d^{1/2}))$ factor competitive with the optimal ball having the desired coverage. It is surprisingly simple and also extends to finding confidence sets competitive against unions of $k$ balls, and improved guarantees under additional assumptions. The algorithm is improper (it outputs an ellipsoid). Combined with our computational intractability result for proper learning balls within an $\exp(\tilde{O}(d^{1-o(1)}))$ approximation factor in volume, our results provide an interesting separation between proper and (improper) learning of confidence sets.} }
Endnote
%0 Conference Paper %T Computing High-dimensional Confidence Sets for Arbitrary Distributions %A Chao Gao %A Liren Shan %A Vaidehi Srinivas %A Aravindan Vijayaraghavan %B Proceedings of Thirty Eighth Conference on Learning Theory %C Proceedings of Machine Learning Research %D 2025 %E Nika Haghtalab %E Ankur Moitra %F pmlr-v291-gao25b %I PMLR %P 2227--2269 %U https://proceedings.mlr.press/v291/gao25b.html %V 291 %X We study the problem of learning a high-density region of an arbitrary distribution over $\mathbb{R}^d$. Given a target coverage parameter $\delta$, and sample access to an arbitrary distribution $\mathcal{D}$, we want to output a confidence set $S \subset \mathbb{R}^d$ such that $S$ achieves $\delta$ coverage of $\mathcal{D}$, i.e., $\mathbb{P}_{y \sim \mathcal{D}} \left[ y \in S \right] \ge \delta$, and the volume of $S$ is as small as possible. This is a central problem in high-dimensional statistics with applications in high-dimensional analogues of finding confidence intervals, uncertainty quantification, and support estimation. In the most general setting, this problem is statistically intractable, so we restrict our attention to competing with sets from a concept class $\mathcal{C}$ with bounded VC-dimension. An algorithm for learning confidence sets is competitive with class $\mathcal{C}$ if, given samples from an arbitrary distribution $\mathcal{D}$, it outputs in polynomial time a set that achieves $\delta$ coverage of $\mathcal{D}$, and whose volume is competitive with the smallest set in $\mathcal{C}$ with the required coverage $\delta$. This problem is computationally challenging even in the basic setting when $\mathcal{C}$ is the set of all Euclidean balls. Existing algorithms based on coresets find in polynomial time a ball whose volume is $\exp(\tilde{O}( d/ \log d))$-factor competitive with the volume of the best ball. Our main result is an algorithm that finds a confidence set whose volume is $\exp(\tilde{O}(d^{1/2}))$ factor competitive with the optimal ball having the desired coverage. It is surprisingly simple and also extends to finding confidence sets competitive against unions of $k$ balls, and improved guarantees under additional assumptions. The algorithm is improper (it outputs an ellipsoid). Combined with our computational intractability result for proper learning balls within an $\exp(\tilde{O}(d^{1-o(1)}))$ approximation factor in volume, our results provide an interesting separation between proper and (improper) learning of confidence sets.
APA
Gao, C., Shan, L., Srinivas, V. & Vijayaraghavan, A.. (2025). Computing High-dimensional Confidence Sets for Arbitrary Distributions. Proceedings of Thirty Eighth Conference on Learning Theory, in Proceedings of Machine Learning Research 291:2227-2269 Available from https://proceedings.mlr.press/v291/gao25b.html.

Related Material