Basis Learning as an Algorithmic Primitive

Mikhail Belkin; Luis Rademacher; James Voss

Basis Learning as an Algorithmic Primitive

Mikhail Belkin, Luis Rademacher, James Voss

29th Annual Conference on Learning Theory, PMLR 49:446-487, 2016.

Abstract

A number of important problems in theoretical computer science and machine learning can be interpreted as recovering a certain basis. These include symmetric matrix eigendecomposition, certain tensor decompositions, Independent Component Analysis (ICA), spectral clustering and Gaussian mixture learning. Each of these problems reduces to an instance of our general model, which we call a “Basis Encoding Function" (BEF). We show that learning a basis within this model can then be provably and efficiently achieved using a first order iteration algorithm (gradient iteration). Our algorithm goes beyond tensor methods while generalizing a number of existing algorithms—e.g., the power method for symmetric matrices, the tensor power iteration for orthogonal decomposable tensors, and cumulant-based FastICA—all within a broader function-based dynamical systems framework. Our framework also unifies the unusual phenomenon observed in these domains that they can be solved using efficient non-convex optimization. Specifically, we describe a class of BEFs such that their local maxima on the unit sphere are in one-to-one correspondence with the basis elements. This description relies on a certain “hidden convexity" property of these functions. We provide a complete theoretical analysis of the gradient iteration even when the BEF is perturbed. We show convergence and complexity bounds polynomial in dimension and other relevant parameters, such as perturbation size. Our perturbation results can be considered as a non-linear version of the classical Davis-Kahan theorem for perturbations of eigenvectors of symmetric matrices. In addition we show that our algorithm exhibits fast (superlinear) convergence and relate the speed of convergence to the properties of the BEF. Moreover, the gradient iteration algorithm can be easily and efficiently implemented in practice.

Cite this Paper

BibTeX


@InProceedings{pmlr-v49-belkin16,
  title = 	 {Basis Learning as an Algorithmic Primitive},
  author = 	 {Belkin, Mikhail and Rademacher, Luis and Voss, James},
  booktitle = 	 {29th Annual Conference on Learning Theory},
  pages = 	 {446--487},
  year = 	 {2016},
  editor = 	 {Feldman, Vitaly and Rakhlin, Alexander and Shamir, Ohad},
  volume = 	 {49},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {Columbia University, New York, New York, USA},
  month = 	 {23--26 Jun},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v49/belkin16.pdf},
  url = 	 {https://proceedings.mlr.press/v49/belkin16.html},
  abstract = 	 {A number of important problems in theoretical computer science and machine learning can be interpreted as recovering a certain basis. These include  symmetric matrix eigendecomposition, certain tensor decompositions, Independent Component Analysis (ICA), spectral clustering and Gaussian mixture learning. Each of these problems reduces to an instance of our general model, which we call a “Basis Encoding Function" (BEF). We show that learning a basis within this model can then be provably and efficiently achieved using a first order  iteration algorithm (gradient iteration). Our algorithm goes beyond tensor methods while generalizing a number of existing algorithms—e.g., the power method for symmetric matrices, the tensor power iteration for orthogonal decomposable tensors, and cumulant-based FastICA—all within a broader function-based dynamical systems framework. Our framework also unifies the unusual phenomenon observed in these domains that they can be solved using efficient non-convex optimization. Specifically, we describe a class of BEFs such that their local maxima on the unit sphere are in one-to-one correspondence with the basis elements. This description relies on a certain “hidden convexity" property of these functions. We provide a complete theoretical analysis of the gradient iteration even when the BEF is perturbed. We show convergence and complexity bounds polynomial in dimension and other relevant parameters, such as perturbation size. Our perturbation results can be considered as a  non-linear version of the classical Davis-Kahan theorem for perturbations of eigenvectors of symmetric matrices. In addition we show that   our algorithm exhibits fast (superlinear) convergence and relate the speed of convergence to the properties of the BEF. Moreover, the gradient iteration algorithm can be easily and efficiently implemented in practice.}
}

Endnote

%0 Conference Paper
%T Basis Learning as an Algorithmic Primitive
%A Mikhail Belkin
%A Luis Rademacher
%A James Voss
%B 29th Annual Conference on Learning Theory
%C Proceedings of Machine Learning Research
%D 2016
%E Vitaly Feldman
%E Alexander Rakhlin
%E Ohad Shamir	
%F pmlr-v49-belkin16
%I PMLR
%P 446--487
%U https://proceedings.mlr.press/v49/belkin16.html
%V 49
%X A number of important problems in theoretical computer science and machine learning can be interpreted as recovering a certain basis. These include  symmetric matrix eigendecomposition, certain tensor decompositions, Independent Component Analysis (ICA), spectral clustering and Gaussian mixture learning. Each of these problems reduces to an instance of our general model, which we call a “Basis Encoding Function" (BEF). We show that learning a basis within this model can then be provably and efficiently achieved using a first order  iteration algorithm (gradient iteration). Our algorithm goes beyond tensor methods while generalizing a number of existing algorithms—e.g., the power method for symmetric matrices, the tensor power iteration for orthogonal decomposable tensors, and cumulant-based FastICA—all within a broader function-based dynamical systems framework. Our framework also unifies the unusual phenomenon observed in these domains that they can be solved using efficient non-convex optimization. Specifically, we describe a class of BEFs such that their local maxima on the unit sphere are in one-to-one correspondence with the basis elements. This description relies on a certain “hidden convexity" property of these functions. We provide a complete theoretical analysis of the gradient iteration even when the BEF is perturbed. We show convergence and complexity bounds polynomial in dimension and other relevant parameters, such as perturbation size. Our perturbation results can be considered as a  non-linear version of the classical Davis-Kahan theorem for perturbations of eigenvectors of symmetric matrices. In addition we show that   our algorithm exhibits fast (superlinear) convergence and relate the speed of convergence to the properties of the BEF. Moreover, the gradient iteration algorithm can be easily and efficiently implemented in practice.

RIS


TY  - CPAPER
TI  - Basis Learning as an Algorithmic Primitive
AU  - Mikhail Belkin
AU  - Luis Rademacher
AU  - James Voss
BT  - 29th Annual Conference on Learning Theory
DA  - 2016/06/06
ED  - Vitaly Feldman
ED  - Alexander Rakhlin
ED  - Ohad Shamir	
ID  - pmlr-v49-belkin16
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 49
SP  - 446
EP  - 487
L1  - http://proceedings.mlr.press/v49/belkin16.pdf
UR  - https://proceedings.mlr.press/v49/belkin16.html
AB  - A number of important problems in theoretical computer science and machine learning can be interpreted as recovering a certain basis. These include  symmetric matrix eigendecomposition, certain tensor decompositions, Independent Component Analysis (ICA), spectral clustering and Gaussian mixture learning. Each of these problems reduces to an instance of our general model, which we call a “Basis Encoding Function" (BEF). We show that learning a basis within this model can then be provably and efficiently achieved using a first order  iteration algorithm (gradient iteration). Our algorithm goes beyond tensor methods while generalizing a number of existing algorithms—e.g., the power method for symmetric matrices, the tensor power iteration for orthogonal decomposable tensors, and cumulant-based FastICA—all within a broader function-based dynamical systems framework. Our framework also unifies the unusual phenomenon observed in these domains that they can be solved using efficient non-convex optimization. Specifically, we describe a class of BEFs such that their local maxima on the unit sphere are in one-to-one correspondence with the basis elements. This description relies on a certain “hidden convexity" property of these functions. We provide a complete theoretical analysis of the gradient iteration even when the BEF is perturbed. We show convergence and complexity bounds polynomial in dimension and other relevant parameters, such as perturbation size. Our perturbation results can be considered as a  non-linear version of the classical Davis-Kahan theorem for perturbations of eigenvectors of symmetric matrices. In addition we show that   our algorithm exhibits fast (superlinear) convergence and relate the speed of convergence to the properties of the BEF. Moreover, the gradient iteration algorithm can be easily and efficiently implemented in practice.
ER  -

APA


Belkin, M., Rademacher, L. & Voss, J.. (2016). Basis Learning as an Algorithmic Primitive. 29th Annual Conference on Learning Theory, in Proceedings of Machine Learning Research 49:446-487 Available from https://proceedings.mlr.press/v49/belkin16.html.

Related Material

Download PDF