An Explicit Sampling Dependent Spectral Error Bound for Column Subset Selection

Tianbao Yang; Lijun Zhang; Rong Jin; Shenghuo Zhu

An Explicit Sampling Dependent Spectral Error Bound for Column Subset Selection

Tianbao Yang, Lijun Zhang, Rong Jin, Shenghuo Zhu

Proceedings of the 32nd International Conference on Machine Learning, PMLR 37:135-143, 2015.

Abstract

In this paper, we consider the problem of column subset selection. We present a novel analysis of the spectral norm reconstruction for a simple randomized algorithm and establish a new bound that depends explicitly on the sampling probabilities. The sampling dependent error bound (i) allows us to better understand the tradeoff in the reconstruction error due to sampling probabilities, (ii) exhibits more insights than existing error bounds that exploit specific probability distributions, and (iii) implies better sampling distributions. In particular, we show that a sampling distribution with probabilities proportional to the square root of the statistical leverage scores is better than uniform sampling, and is better than leverage-based sampling when the statistical leverage scores are very nonuniform. And by solving a constrained optimization problem related to the error bound with an efficient bisection search we are able to achieve better performance than using either the leverage-based distribution or that proportional to the square root of the statistical leverage scores. Numerical simulations demonstrate the benefits of the new sampling distributions for low-rank matrix approximation and least square approximation compared to state-of-the art algorithms.

Cite this Paper

BibTeX


@InProceedings{pmlr-v37-yanga15,
  title = 	 {An Explicit Sampling Dependent Spectral Error Bound for Column Subset Selection},
  author = 	 {Yang, Tianbao and Zhang, Lijun and Jin, Rong and Zhu, Shenghuo},
  booktitle = 	 {Proceedings of the 32nd International Conference on Machine Learning},
  pages = 	 {135--143},
  year = 	 {2015},
  editor = 	 {Bach, Francis and Blei, David},
  volume = 	 {37},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {Lille, France},
  month = 	 {07--09 Jul},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v37/yanga15.pdf},
  url = 	 {https://proceedings.mlr.press/v37/yanga15.html},
  abstract = 	 {In this paper, we consider the problem of column subset selection. We present a novel analysis of the spectral norm reconstruction for a simple randomized algorithm and establish a new bound that depends explicitly on the sampling probabilities. The sampling dependent error bound (i) allows us to better understand the tradeoff in the reconstruction error due to sampling probabilities, (ii) exhibits more insights than existing error bounds that exploit specific probability distributions, and (iii) implies better sampling distributions. In particular, we show that a sampling distribution with probabilities proportional to the square root of the statistical leverage scores is better than uniform sampling, and is better than leverage-based sampling when the statistical leverage scores are very nonuniform. And by solving a constrained optimization problem related to the error bound with an efficient bisection search we are able to achieve better performance than using either the leverage-based distribution or that proportional to the square root of the statistical leverage scores. Numerical simulations demonstrate the benefits of the new sampling distributions for low-rank matrix approximation and least square approximation compared to state-of-the art algorithms.}
}

Endnote

%0 Conference Paper
%T An Explicit Sampling Dependent Spectral Error Bound for Column Subset Selection
%A Tianbao Yang
%A Lijun Zhang
%A Rong Jin
%A Shenghuo Zhu
%B Proceedings of the 32nd International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2015
%E Francis Bach
%E David Blei	
%F pmlr-v37-yanga15
%I PMLR
%P 135--143
%U https://proceedings.mlr.press/v37/yanga15.html
%V 37
%X In this paper, we consider the problem of column subset selection. We present a novel analysis of the spectral norm reconstruction for a simple randomized algorithm and establish a new bound that depends explicitly on the sampling probabilities. The sampling dependent error bound (i) allows us to better understand the tradeoff in the reconstruction error due to sampling probabilities, (ii) exhibits more insights than existing error bounds that exploit specific probability distributions, and (iii) implies better sampling distributions. In particular, we show that a sampling distribution with probabilities proportional to the square root of the statistical leverage scores is better than uniform sampling, and is better than leverage-based sampling when the statistical leverage scores are very nonuniform. And by solving a constrained optimization problem related to the error bound with an efficient bisection search we are able to achieve better performance than using either the leverage-based distribution or that proportional to the square root of the statistical leverage scores. Numerical simulations demonstrate the benefits of the new sampling distributions for low-rank matrix approximation and least square approximation compared to state-of-the art algorithms.

RIS


TY  - CPAPER
TI  - An Explicit Sampling Dependent Spectral Error Bound for Column Subset Selection
AU  - Tianbao Yang
AU  - Lijun Zhang
AU  - Rong Jin
AU  - Shenghuo Zhu
BT  - Proceedings of the 32nd International Conference on Machine Learning
DA  - 2015/06/01
ED  - Francis Bach
ED  - David Blei	
ID  - pmlr-v37-yanga15
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 37
SP  - 135
EP  - 143
L1  - http://proceedings.mlr.press/v37/yanga15.pdf
UR  - https://proceedings.mlr.press/v37/yanga15.html
AB  - In this paper, we consider the problem of column subset selection. We present a novel analysis of the spectral norm reconstruction for a simple randomized algorithm and establish a new bound that depends explicitly on the sampling probabilities. The sampling dependent error bound (i) allows us to better understand the tradeoff in the reconstruction error due to sampling probabilities, (ii) exhibits more insights than existing error bounds that exploit specific probability distributions, and (iii) implies better sampling distributions. In particular, we show that a sampling distribution with probabilities proportional to the square root of the statistical leverage scores is better than uniform sampling, and is better than leverage-based sampling when the statistical leverage scores are very nonuniform. And by solving a constrained optimization problem related to the error bound with an efficient bisection search we are able to achieve better performance than using either the leverage-based distribution or that proportional to the square root of the statistical leverage scores. Numerical simulations demonstrate the benefits of the new sampling distributions for low-rank matrix approximation and least square approximation compared to state-of-the art algorithms.
ER  -

APA


Yang, T., Zhang, L., Jin, R. & Zhu, S.. (2015). An Explicit Sampling Dependent Spectral Error Bound for Column Subset Selection. Proceedings of the 32nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 37:135-143 Available from https://proceedings.mlr.press/v37/yanga15.html.

Related Material

Download PDF