Sketching Meets Random Projection in the Dual: A Provable Recovery Algorithm for Big and High-dimensional Data

Jialei Wang, Jason Lee, Mehrdad Mahdavi, Mladen Kolar, Nati Srebro
Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, PMLR 54:1150-1158, 2017.

Abstract

Sketching techniques scale up machine learning algorithms by reducing the sample size or dimensionality of massive data sets, without sacrificing their statistical properties. In this paper, we study sketching from an optimization point of view. We first show that the iterative Hessian sketch is an optimization process with \emphpreconditioning and develop an \emphaccelerated version using this insight together with conjugate gradient descent. Next, we establish a primal-dual connection between the Hessian sketch and dual random projection, which allows us to develop an \emphaccelerated iterative dual random projection method by applying the preconditioned conjugate gradient descent on the dual problem. Finally, we tackle the problems of large sample size and high-dimensionality in massive data sets by developing the \emphprimal-dual sketch. The primal-dual sketch iteratively sketches the primal and dual formulations and requires only a logarithmic number of calls to solvers of small sub-problems to recover the optimum of the original problem up to \empharbitrary precision. Extensive experiments on synthetic and real data sets complement our theoretical results.

Cite this Paper


BibTeX
@InProceedings{pmlr-v54-wang17d, title = {{Sketching Meets Random Projection in the Dual: A Provable Recovery Algorithm for Big and High-dimensional Data}}, author = {Wang, Jialei and Lee, Jason and Mahdavi, Mehrdad and Kolar, Mladen and Srebro, Nati}, booktitle = {Proceedings of the 20th International Conference on Artificial Intelligence and Statistics}, pages = {1150--1158}, year = {2017}, editor = {Singh, Aarti and Zhu, Jerry}, volume = {54}, series = {Proceedings of Machine Learning Research}, month = {20--22 Apr}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v54/wang17d/wang17d.pdf}, url = {https://proceedings.mlr.press/v54/wang17d.html}, abstract = {Sketching techniques scale up machine learning algorithms by reducing the sample size or dimensionality of massive data sets, without sacrificing their statistical properties. In this paper, we study sketching from an optimization point of view. We first show that the iterative Hessian sketch is an optimization process with \emphpreconditioning and develop an \emphaccelerated version using this insight together with conjugate gradient descent. Next, we establish a primal-dual connection between the Hessian sketch and dual random projection, which allows us to develop an \emphaccelerated iterative dual random projection method by applying the preconditioned conjugate gradient descent on the dual problem. Finally, we tackle the problems of large sample size and high-dimensionality in massive data sets by developing the \emphprimal-dual sketch. The primal-dual sketch iteratively sketches the primal and dual formulations and requires only a logarithmic number of calls to solvers of small sub-problems to recover the optimum of the original problem up to \empharbitrary precision. Extensive experiments on synthetic and real data sets complement our theoretical results.} }
Endnote
%0 Conference Paper %T Sketching Meets Random Projection in the Dual: A Provable Recovery Algorithm for Big and High-dimensional Data %A Jialei Wang %A Jason Lee %A Mehrdad Mahdavi %A Mladen Kolar %A Nati Srebro %B Proceedings of the 20th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2017 %E Aarti Singh %E Jerry Zhu %F pmlr-v54-wang17d %I PMLR %P 1150--1158 %U https://proceedings.mlr.press/v54/wang17d.html %V 54 %X Sketching techniques scale up machine learning algorithms by reducing the sample size or dimensionality of massive data sets, without sacrificing their statistical properties. In this paper, we study sketching from an optimization point of view. We first show that the iterative Hessian sketch is an optimization process with \emphpreconditioning and develop an \emphaccelerated version using this insight together with conjugate gradient descent. Next, we establish a primal-dual connection between the Hessian sketch and dual random projection, which allows us to develop an \emphaccelerated iterative dual random projection method by applying the preconditioned conjugate gradient descent on the dual problem. Finally, we tackle the problems of large sample size and high-dimensionality in massive data sets by developing the \emphprimal-dual sketch. The primal-dual sketch iteratively sketches the primal and dual formulations and requires only a logarithmic number of calls to solvers of small sub-problems to recover the optimum of the original problem up to \empharbitrary precision. Extensive experiments on synthetic and real data sets complement our theoretical results.
APA
Wang, J., Lee, J., Mahdavi, M., Kolar, M. & Srebro, N.. (2017). Sketching Meets Random Projection in the Dual: A Provable Recovery Algorithm for Big and High-dimensional Data. Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 54:1150-1158 Available from https://proceedings.mlr.press/v54/wang17d.html.

Related Material