Dropping Convexity for Faster Semi-definite Optimization

Srinadh Bhojanapalli; Anastasios Kyrillidis; Sujay Sanghavi

Dropping Convexity for Faster Semi-definite Optimization

Srinadh Bhojanapalli, Anastasios Kyrillidis, Sujay Sanghavi

29th Annual Conference on Learning Theory, PMLR 49:530-582, 2016.

Abstract

We study the minimization of a convex function f(X) over the set of n \times n positive semi-definite matrices, but when the problem is recast as \min_U g(U) := f(UU^⊤), with U ∈\mathbbR^n \times r and r ≤n. We study the performance of gradient descent on g—which we refer to as Factored Gradient Descent (\textscFgd)—under standard assumptions on the \em original function f. We provide a rule for selecting the step size and, with this choice, show that the \emphlocal convergence rate of \textscFgd mirrors that of standard gradient descent on the original f: \emphi.e., after k steps, the error is O(1/k) for smooth f, and exponentially small in k when f is (restricted) strongly convex. In addition, we provide a procedure to initialize \textscFgd for (restricted) strongly convex objectives and when one only has access to f via a first-order oracle; for several problem instances, such proper initialization leads to \emphglobal convergence guarantees. \textscFgd and similar procedures are widely used in practice for problems that can be posed as matrix factorization. To the best of our knowledge, this is the first paper to provide precise convergence rate guarantees for general convex functions under standard convex assumptions.

Cite this Paper

BibTeX


@InProceedings{pmlr-v49-bhojanapalli16,
  title = 	 {Dropping Convexity for Faster Semi-definite Optimization},
  author = 	 {Bhojanapalli, Srinadh and Kyrillidis, Anastasios and Sanghavi, Sujay},
  booktitle = 	 {29th Annual Conference on Learning Theory},
  pages = 	 {530--582},
  year = 	 {2016},
  editor = 	 {Feldman, Vitaly and Rakhlin, Alexander and Shamir, Ohad},
  volume = 	 {49},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {Columbia University, New York, New York, USA},
  month = 	 {23--26 Jun},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v49/bhojanapalli16.pdf},
  url = 	 {https://proceedings.mlr.press/v49/bhojanapalli16.html},
  abstract = 	 {We study the minimization of a convex function f(X) over the set of n \times n positive semi-definite matrices, but when the problem is recast as \min_U g(U) :=  f(UU^⊤), with U ∈\mathbbR^n \times r and r ≤n. We study the performance of gradient descent on g—which we refer to as Factored Gradient Descent (\textscFgd)—under standard assumptions on the \em original function f. We provide a rule for selecting the step size and, with this choice, show that the \emphlocal convergence rate of \textscFgd mirrors that of standard gradient descent on the original f: \emphi.e., after k steps, the error is O(1/k) for smooth f, and exponentially small in k when f is (restricted) strongly convex. In addition, we provide a procedure to initialize \textscFgd for (restricted) strongly convex objectives and when one only has access to f via a first-order oracle; for several problem instances, such proper initialization leads to \emphglobal convergence guarantees. \textscFgd and similar procedures are widely used in practice for problems that can be posed as matrix factorization. To the best of our knowledge, this is the first paper to provide precise convergence rate guarantees for general convex functions under standard convex assumptions.}
}

Endnote

%0 Conference Paper
%T Dropping Convexity for Faster Semi-definite Optimization
%A Srinadh Bhojanapalli
%A Anastasios Kyrillidis
%A Sujay Sanghavi
%B 29th Annual Conference on Learning Theory
%C Proceedings of Machine Learning Research
%D 2016
%E Vitaly Feldman
%E Alexander Rakhlin
%E Ohad Shamir	
%F pmlr-v49-bhojanapalli16
%I PMLR
%P 530--582
%U https://proceedings.mlr.press/v49/bhojanapalli16.html
%V 49
%X We study the minimization of a convex function f(X) over the set of n \times n positive semi-definite matrices, but when the problem is recast as \min_U g(U) :=  f(UU^⊤), with U ∈\mathbbR^n \times r and r ≤n. We study the performance of gradient descent on g—which we refer to as Factored Gradient Descent (\textscFgd)—under standard assumptions on the \em original function f. We provide a rule for selecting the step size and, with this choice, show that the \emphlocal convergence rate of \textscFgd mirrors that of standard gradient descent on the original f: \emphi.e., after k steps, the error is O(1/k) for smooth f, and exponentially small in k when f is (restricted) strongly convex. In addition, we provide a procedure to initialize \textscFgd for (restricted) strongly convex objectives and when one only has access to f via a first-order oracle; for several problem instances, such proper initialization leads to \emphglobal convergence guarantees. \textscFgd and similar procedures are widely used in practice for problems that can be posed as matrix factorization. To the best of our knowledge, this is the first paper to provide precise convergence rate guarantees for general convex functions under standard convex assumptions.

RIS


TY  - CPAPER
TI  - Dropping Convexity for Faster Semi-definite Optimization
AU  - Srinadh Bhojanapalli
AU  - Anastasios Kyrillidis
AU  - Sujay Sanghavi
BT  - 29th Annual Conference on Learning Theory
DA  - 2016/06/06
ED  - Vitaly Feldman
ED  - Alexander Rakhlin
ED  - Ohad Shamir	
ID  - pmlr-v49-bhojanapalli16
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 49
SP  - 530
EP  - 582
L1  - http://proceedings.mlr.press/v49/bhojanapalli16.pdf
UR  - https://proceedings.mlr.press/v49/bhojanapalli16.html
AB  - We study the minimization of a convex function f(X) over the set of n \times n positive semi-definite matrices, but when the problem is recast as \min_U g(U) :=  f(UU^⊤), with U ∈\mathbbR^n \times r and r ≤n. We study the performance of gradient descent on g—which we refer to as Factored Gradient Descent (\textscFgd)—under standard assumptions on the \em original function f. We provide a rule for selecting the step size and, with this choice, show that the \emphlocal convergence rate of \textscFgd mirrors that of standard gradient descent on the original f: \emphi.e., after k steps, the error is O(1/k) for smooth f, and exponentially small in k when f is (restricted) strongly convex. In addition, we provide a procedure to initialize \textscFgd for (restricted) strongly convex objectives and when one only has access to f via a first-order oracle; for several problem instances, such proper initialization leads to \emphglobal convergence guarantees. \textscFgd and similar procedures are widely used in practice for problems that can be posed as matrix factorization. To the best of our knowledge, this is the first paper to provide precise convergence rate guarantees for general convex functions under standard convex assumptions.
ER  -

APA


Bhojanapalli, S., Kyrillidis, A. & Sanghavi, S.. (2016). Dropping Convexity for Faster Semi-definite Optimization. 29th Annual Conference on Learning Theory, in Proceedings of Machine Learning Research 49:530-582 Available from https://proceedings.mlr.press/v49/bhojanapalli16.html.

Related Material

Download PDF