Near Optimal Heteroscedastic Regression with Symbiotic Learning

Aniket Das; Dheeraj M. Nagaraj; Praneeth Netrapalli; Dheeraj Baby

Near Optimal Heteroscedastic Regression with Symbiotic Learning

Aniket Das, Dheeraj M. Nagaraj, Praneeth Netrapalli, Dheeraj Baby

Proceedings of Thirty Sixth Conference on Learning Theory, PMLR 195:3696-3757, 2023.

Abstract

We consider the classical problem of heteroscedastic linear regression where, given

$n$ i.i.d. samples

$(\vx_i, y_i)$ drawn from the model

$y_i = \dotp{\wstar}{\vx_i} + \epsilon_i \cdot \dotp{\fstar}{\vx_i}, \vx_i \sim \cN(0, \vI), \epsilon_i \sim \cN(0, 1)$ , our aim is to \emph{estimate the regressor}

$\wstar$ \emph{without prior knowledge of the noise parameter}

$\fstar$ . In addition to classical applications of such models in statistics \citep{jobson1980least}, econometrics \citep{harvey1976estimating}, time series analysis \citep{engle1982autoregressive} etc., it is also particularly relevant in machine learning problems where data is collected from multiple sources of varying (but apriori unknown) quality, e.g., in the training of large models \citep{devlin2019bert} on web-scale data. In this work, we develop an algorithm called \emph{\ouralg} (short for \emph{Symb}iotic \emph{Learn}ing) which estimates

$\wstar$ in squared norm upto an error of

$\Otilde(\| \fstar \|^2 \cdot (\nicefrac{1}{n} + (\nicefrac{d}{n})^2))$ , and prove that this rate is minimax optimal modulo logarithmic factors. This represents a substantial improvement upon the previous best known upper bound of

$\Otilde(\| \fstar \|^2 \cdot \nicefrac{d}{n})$ . Our algorithm is essentially an alternating minimization procedure which comprises of two key subroutines 1. An adaptation of the classical weighted least squares heuristic to estimate

$\wstar$ (dating back to at least \citet{davidian1987variance}), for which our work presents the first non-asymptotic guarantee; 2. A novel non-convex pseudogradient descent procedure for estimating

$\fstar$ , which draws inspiration from the phase retrieval literature. As corollaries of our analysis, we obtain fast non-asymptotic rates for two important problems, linear regression with multiplicative noise, and phase retrieval with multiplicative noise, both of which could be of independent interest. Beyond this, the proof of our lower bound, which involves a novel adaptation of LeCam’s two point method for handling infinite mutual information quantities (thereby preventing a direct application of standard techniques such as Fano’s method), could also be of broader interest for establishing lower bounds for other heteroscedastic or heavy tailed statistical problems.

Cite this Paper

BibTeX


@InProceedings{pmlr-v195-das23a,
  title = 	 {Near Optimal Heteroscedastic Regression with Symbiotic Learning},
  author =       {Das, Aniket and Nagaraj, Dheeraj M. and Netrapalli, Praneeth and Baby, Dheeraj},
  booktitle = 	 {Proceedings of Thirty Sixth Conference on Learning Theory},
  pages = 	 {3696--3757},
  year = 	 {2023},
  editor = 	 {Neu, Gergely and Rosasco, Lorenzo},
  volume = 	 {195},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {12--15 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v195/das23a/das23a.pdf},
  url = 	 {https://proceedings.mlr.press/v195/das23a.html},
  abstract = 	 {We consider the classical problem of heteroscedastic linear regression where, given $n$ i.i.d. samples $(\vx_i, y_i)$ drawn from the model $y_i = \dotp{\wstar}{\vx_i} + \epsilon_i \cdot \dotp{\fstar}{\vx_i},  \vx_i \sim \cN(0, \vI),  \epsilon_i \sim \cN(0, 1)$, our aim is to \emph{estimate the regressor} $\wstar$ \emph{without prior knowledge of the noise parameter} $\fstar$. In addition to classical applications of such models in statistics \citep{jobson1980least}, econometrics \citep{harvey1976estimating}, time series analysis \citep{engle1982autoregressive} etc., it is also particularly relevant in machine learning problems where data is collected from multiple sources of varying (but apriori unknown) quality, e.g., in the training of large models \citep{devlin2019bert} on web-scale data. In this work, we develop an algorithm called \emph{\ouralg} (short for \emph{Symb}iotic \emph{Learn}ing) which estimates $\wstar$ in squared norm upto an error of $\Otilde(\| \fstar \|^2 \cdot (\nicefrac{1}{n} + (\nicefrac{d}{n})^2))$, and prove that this rate is minimax optimal modulo logarithmic factors. This represents a substantial improvement upon the previous best known upper bound of $\Otilde(\| \fstar \|^2 \cdot \nicefrac{d}{n})$. Our algorithm is essentially an alternating minimization procedure which comprises of two key subroutines 1. An adaptation of the classical weighted least squares heuristic to estimate $\wstar$ (dating back to at least \citet{davidian1987variance}), for which our work presents the first non-asymptotic guarantee; 2. A novel non-convex pseudogradient descent procedure for estimating $\fstar$, which draws inspiration from the phase retrieval literature. As corollaries of our analysis, we obtain fast non-asymptotic rates for two important problems, linear regression with multiplicative noise, and phase retrieval with multiplicative noise, both of which could be of independent interest. Beyond this, the proof of our lower bound, which involves a novel adaptation of LeCam’s two point method for handling infinite mutual information quantities (thereby preventing a direct application of standard techniques such as Fano’s method), could also be of broader interest for establishing lower bounds for other heteroscedastic or heavy tailed statistical problems.}
}

Endnote

%0 Conference Paper
%T Near Optimal Heteroscedastic Regression with Symbiotic Learning
%A Aniket Das
%A Dheeraj M. Nagaraj
%A Praneeth Netrapalli
%A Dheeraj Baby
%B Proceedings of Thirty Sixth Conference on Learning Theory
%C Proceedings of Machine Learning Research
%D 2023
%E Gergely Neu
%E Lorenzo Rosasco	
%F pmlr-v195-das23a
%I PMLR
%P 3696--3757
%U https://proceedings.mlr.press/v195/das23a.html
%V 195
%X We consider the classical problem of heteroscedastic linear regression where, given $n$ i.i.d. samples $(\vx_i, y_i)$ drawn from the model $y_i = \dotp{\wstar}{\vx_i} + \epsilon_i \cdot \dotp{\fstar}{\vx_i},  \vx_i \sim \cN(0, \vI),  \epsilon_i \sim \cN(0, 1)$, our aim is to \emph{estimate the regressor} $\wstar$ \emph{without prior knowledge of the noise parameter} $\fstar$. In addition to classical applications of such models in statistics \citep{jobson1980least}, econometrics \citep{harvey1976estimating}, time series analysis \citep{engle1982autoregressive} etc., it is also particularly relevant in machine learning problems where data is collected from multiple sources of varying (but apriori unknown) quality, e.g., in the training of large models \citep{devlin2019bert} on web-scale data. In this work, we develop an algorithm called \emph{\ouralg} (short for \emph{Symb}iotic \emph{Learn}ing) which estimates $\wstar$ in squared norm upto an error of $\Otilde(\| \fstar \|^2 \cdot (\nicefrac{1}{n} + (\nicefrac{d}{n})^2))$, and prove that this rate is minimax optimal modulo logarithmic factors. This represents a substantial improvement upon the previous best known upper bound of $\Otilde(\| \fstar \|^2 \cdot \nicefrac{d}{n})$. Our algorithm is essentially an alternating minimization procedure which comprises of two key subroutines 1. An adaptation of the classical weighted least squares heuristic to estimate $\wstar$ (dating back to at least \citet{davidian1987variance}), for which our work presents the first non-asymptotic guarantee; 2. A novel non-convex pseudogradient descent procedure for estimating $\fstar$, which draws inspiration from the phase retrieval literature. As corollaries of our analysis, we obtain fast non-asymptotic rates for two important problems, linear regression with multiplicative noise, and phase retrieval with multiplicative noise, both of which could be of independent interest. Beyond this, the proof of our lower bound, which involves a novel adaptation of LeCam’s two point method for handling infinite mutual information quantities (thereby preventing a direct application of standard techniques such as Fano’s method), could also be of broader interest for establishing lower bounds for other heteroscedastic or heavy tailed statistical problems.

APA


Das, A., Nagaraj, D.M., Netrapalli, P. & Baby, D.. (2023). Near Optimal Heteroscedastic Regression with Symbiotic Learning. Proceedings of Thirty Sixth Conference on Learning Theory, in Proceedings of Machine Learning Research 195:3696-3757 Available from https://proceedings.mlr.press/v195/das23a.html.

Related Material

Download PDF