Active Regression via Linear-Sample Sparsification

Xue Chen; Eric Price

Active Regression via Linear-Sample Sparsification

Xue Chen, Eric Price

Proceedings of the Thirty-Second Conference on Learning Theory, PMLR 99:663-695, 2019.

Abstract

We present an approach that improves the sample complexity for a variety of curve fitting problems, including active learning for linear regression, polynomial regression, and continuous sparse Fourier transforms. In the active linear regression problem, one would like to estimate the least squares solution $\beta^*$ minimizing $\|X\beta - y\|_2$ given the entire unlabeled dataset $X \in \mathbb{R}^{n \times d}$ but only observing a small number of labels $y_i$. We show that $O(d)$ labels suffice to find a constant factor approximation $\widetilde{\beta}$: \[ \mathbb{E}[\|{X} \widetilde{\beta} - y \|_2^2] \leq 2 \mathbb{E}[\|X \beta^* - y\|_2^2]. \]{This} improves on the best previous result of $O(d \log d)$ from leverage score sampling. We also present results for the \emph{inductive} setting, showing when $\widetilde{\beta}$ will generalize to fresh samples; these apply to continuous settings such as polynomial regression. Finally, we show how the techniques yield improved results for the non-linear sparse Fourier transform setting.

Cite this Paper

BibTeX

@InProceedings{pmlr-v99-chen19a,
  title = 	 {Active Regression via Linear-Sample Sparsification},
  author =       {Chen, Xue and Price, Eric},
  booktitle = 	 {Proceedings of the Thirty-Second Conference on Learning Theory},
  pages = 	 {663--695},
  year = 	 {2019},
  editor = 	 {Beygelzimer, Alina and Hsu, Daniel},
  volume = 	 {99},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {25--28 Jun},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v99/chen19a/chen19a.pdf},
  url = 	 {https://proceedings.mlr.press/v99/chen19a.html},
  abstract = 	 { We present an approach that improves the sample complexity for a variety of curve fitting problems, including active learning for linear regression, polynomial regression, and continuous sparse Fourier transforms.  In the active linear regression problem, one would like to estimate the least squares solution $\beta^*$ minimizing $\|X\beta - y\|_2$ given the entire unlabeled dataset $X \in \mathbb{R}^{n \times d}$ but only observing a small number of labels $y_i$.  We show that $O(d)$ labels suffice to find a constant factor approximation $\widetilde{\beta}$: \[ \mathbb{E}[\|{X} \widetilde{\beta} - y \|_2^2] \leq 2 \mathbb{E}[\|X \beta^* - y\|_2^2]. \]{This} improves on the best previous result of $O(d \log d)$ from leverage score sampling.  We also present results for the \emph{inductive} setting, showing when $\widetilde{\beta}$ will generalize to fresh samples; these apply to continuous settings such as polynomial regression.  Finally, we show how the techniques yield improved results for the non-linear sparse Fourier transform setting.   }
}

Endnote

%0 Conference Paper
%T Active Regression via Linear-Sample Sparsification
%A Xue Chen
%A Eric Price
%B Proceedings of the Thirty-Second Conference on Learning Theory
%C Proceedings of Machine Learning Research
%D 2019
%E Alina Beygelzimer
%E Daniel Hsu	
%F pmlr-v99-chen19a
%I PMLR
%P 663--695
%U https://proceedings.mlr.press/v99/chen19a.html
%V 99
%X  We present an approach that improves the sample complexity for a variety of curve fitting problems, including active learning for linear regression, polynomial regression, and continuous sparse Fourier transforms.  In the active linear regression problem, one would like to estimate the least squares solution $\beta^*$ minimizing $\|X\beta - y\|_2$ given the entire unlabeled dataset $X \in \mathbb{R}^{n \times d}$ but only observing a small number of labels $y_i$.  We show that $O(d)$ labels suffice to find a constant factor approximation $\widetilde{\beta}$: \[ \mathbb{E}[\|{X} \widetilde{\beta} - y \|_2^2] \leq 2 \mathbb{E}[\|X \beta^* - y\|_2^2]. \]{This} improves on the best previous result of $O(d \log d)$ from leverage score sampling.  We also present results for the \emph{inductive} setting, showing when $\widetilde{\beta}$ will generalize to fresh samples; these apply to continuous settings such as polynomial regression.  Finally, we show how the techniques yield improved results for the non-linear sparse Fourier transform setting.

APA

Chen, X. & Price, E.. (2019). Active Regression via Linear-Sample Sparsification. Proceedings of the Thirty-Second Conference on Learning Theory, in Proceedings of Machine Learning Research 99:663-695 Available from https://proceedings.mlr.press/v99/chen19a.html.

Related Material

Download PDF