On the statistical rate of nonlinear recovery in generative models with heavytailed data
[edit]
Proceedings of the 36th International Conference on Machine Learning, PMLR 97:66976706, 2019.
Abstract
We consider estimating a highdimensional vector from nonlinear measurements where the unknown vector is represented by a generative model $G:\mathbb{R}^k\rightarrow\mathbb{R}^d$ with $k\ll d$. Such a model poses structural priors on the unknown vector without having a dedicated basis, and in particular allows new and efficient approaches solving recovery problems with number of measurements far less than the ambient dimension of the vector. While progresses have been made recently regarding theoretical understandings on the linear Gaussian measurements, much less is known when the model is possibly misspecified and the measurements are nonGaussian. In this paper, we make a step towards such a direction by considering the scenario where the measurements are nonGaussian, subject to possibly unknown nonlinear transformations and the responses are heavytailed. We then propose new estimators via score functions based on the first and second order Stein’s identity, and prove the sample size bound of $m=\mathcal{O}(k\varepsilon^{2}\log(L/\varepsilon))$ achieving an $\varepsilon$ error in the form of exponential concentration inequalities. Furthermore, for the special case of multilayer ReLU generative model, we improve the sample bound by a logarithm factor to $m=\mathcal{O}(k\varepsilon^{2}\log(d))$, matching the stateofart statistical rate in compressed sensing for estimating $k$sparse vectors. On the technical side, we develop new chaining methods bounding heavytailed processes, which could be of independent interest.
Related Material


