Sample Complexity Bounds for Learning High-dimensional Simplices in Noisy Regimes

Seyed Amir Hossein Saberi, Amir Najafi, Abolfazl Motahari, Babak Khalaj
Proceedings of the 40th International Conference on Machine Learning, PMLR 202:29514-29541, 2023.

Abstract

In this paper, we propose sample complexity bounds for learning a simplex from noisy samples. A dataset of size n is given which includes i.i.d. samples drawn from a uniform distribution over an unknown arbitrary simplex in RK, where samples are assumed to be corrupted by a multi-variate additive Gaussian noise of an arbitrary magnitude. We prove the existence of an algorithm that with high probability outputs a simplex having a 2 distance of at most ε from the true simplex (for any ε>0). Also, we theoretically show that in order to achieve this bound, it is sufficient to have n˜Ω(K2/ε2)eΩ(K/SNR2) samples, where SNR stands for the signal-to-noise ratio and is defined as the ratio of the maximum component-wise standard deviation of the simplex (signal) to that of the noise vector. This result solves an important open problem in this area of research, and shows as long as SNRΩ(K) the sample complexity of the noisy regime has the same order to that of the noiseless case. Our proofs are a combination of the so-called sample compression technique in (Ashtiani et al., 2018), mathematical tools from high-dimensional geometry, and Fourier analysis. In particular, we have proposed a general Fourier-based technique for recovery of a more general class of distribution families from additive Gaussian noise, which can be further used in a variety of other related problems.

Cite this Paper


BibTeX
@InProceedings{pmlr-v202-saberi23a, title = {Sample Complexity Bounds for Learning High-dimensional Simplices in Noisy Regimes}, author = {Saberi, Seyed Amir Hossein and Najafi, Amir and Motahari, Abolfazl and Khalaj, Babak}, booktitle = {Proceedings of the 40th International Conference on Machine Learning}, pages = {29514--29541}, year = {2023}, editor = {Krause, Andreas and Brunskill, Emma and Cho, Kyunghyun and Engelhardt, Barbara and Sabato, Sivan and Scarlett, Jonathan}, volume = {202}, series = {Proceedings of Machine Learning Research}, month = {23--29 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v202/saberi23a/saberi23a.pdf}, url = {https://proceedings.mlr.press/v202/saberi23a.html}, abstract = {In this paper, we propose sample complexity bounds for learning a simplex from noisy samples. A dataset of size $n$ is given which includes i.i.d. samples drawn from a uniform distribution over an unknown arbitrary simplex in $\mathbb{R}^K$, where samples are assumed to be corrupted by a multi-variate additive Gaussian noise of an arbitrary magnitude. We prove the existence of an algorithm that with high probability outputs a simplex having a $\ell_2$ distance of at most $\varepsilon$ from the true simplex (for any $\varepsilon>0$). Also, we theoretically show that in order to achieve this bound, it is sufficient to have $n\ge\tilde{\Omega}\left(K^2/\varepsilon^2\right)e^{\Omega\left(K/\mathrm{SNR}^2\right)}$ samples, where $\mathrm{SNR}$ stands for the signal-to-noise ratio and is defined as the ratio of the maximum component-wise standard deviation of the simplex (signal) to that of the noise vector. This result solves an important open problem in this area of research, and shows as long as $\mathrm{SNR}\ge\Omega\left(\sqrt{K}\right)$ the sample complexity of the noisy regime has the same order to that of the noiseless case. Our proofs are a combination of the so-called sample compression technique in (Ashtiani et al., 2018), mathematical tools from high-dimensional geometry, and Fourier analysis. In particular, we have proposed a general Fourier-based technique for recovery of a more general class of distribution families from additive Gaussian noise, which can be further used in a variety of other related problems.} }
Endnote
%0 Conference Paper %T Sample Complexity Bounds for Learning High-dimensional Simplices in Noisy Regimes %A Seyed Amir Hossein Saberi %A Amir Najafi %A Abolfazl Motahari %A Babak Khalaj %B Proceedings of the 40th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2023 %E Andreas Krause %E Emma Brunskill %E Kyunghyun Cho %E Barbara Engelhardt %E Sivan Sabato %E Jonathan Scarlett %F pmlr-v202-saberi23a %I PMLR %P 29514--29541 %U https://proceedings.mlr.press/v202/saberi23a.html %V 202 %X In this paper, we propose sample complexity bounds for learning a simplex from noisy samples. A dataset of size $n$ is given which includes i.i.d. samples drawn from a uniform distribution over an unknown arbitrary simplex in $\mathbb{R}^K$, where samples are assumed to be corrupted by a multi-variate additive Gaussian noise of an arbitrary magnitude. We prove the existence of an algorithm that with high probability outputs a simplex having a $\ell_2$ distance of at most $\varepsilon$ from the true simplex (for any $\varepsilon>0$). Also, we theoretically show that in order to achieve this bound, it is sufficient to have $n\ge\tilde{\Omega}\left(K^2/\varepsilon^2\right)e^{\Omega\left(K/\mathrm{SNR}^2\right)}$ samples, where $\mathrm{SNR}$ stands for the signal-to-noise ratio and is defined as the ratio of the maximum component-wise standard deviation of the simplex (signal) to that of the noise vector. This result solves an important open problem in this area of research, and shows as long as $\mathrm{SNR}\ge\Omega\left(\sqrt{K}\right)$ the sample complexity of the noisy regime has the same order to that of the noiseless case. Our proofs are a combination of the so-called sample compression technique in (Ashtiani et al., 2018), mathematical tools from high-dimensional geometry, and Fourier analysis. In particular, we have proposed a general Fourier-based technique for recovery of a more general class of distribution families from additive Gaussian noise, which can be further used in a variety of other related problems.
APA
Saberi, S.A.H., Najafi, A., Motahari, A. & Khalaj, B.. (2023). Sample Complexity Bounds for Learning High-dimensional Simplices in Noisy Regimes. Proceedings of the 40th International Conference on Machine Learning, in Proceedings of Machine Learning Research 202:29514-29541 Available from https://proceedings.mlr.press/v202/saberi23a.html.

Related Material