[edit]
On the Regret Minimization of Nonconvex Online Gradient Ascent for Online PCA
Proceedings of the Thirty-Second Conference on Learning Theory, PMLR 99:1349-1373, 2019.
Abstract
In this paper we focus on the problem of Online Principal Component Analysis in the regret minimization framework. For this problem, all existing regret minimization algorithms for the fully-adversarial setting are based on a positive semidefinite convex relaxation, and hence require quadratic memory and SVD computation (either thin of full) on each iteration, which amounts to at least quadratic runtime per iteration. This is in stark contrast to a corresponding stochastic i.i.d. variant of the problem, which was studied extensively lately, and admits very efficient gradient ascent algorithms that work directly on the natural non-convex formulation of the problem, and hence require only linear memory and linear runtime per iteration. This raises the question: \textit{can non-convex online gradient ascent algorithms be shown to minimize regret in online adversarial settings?} In this paper we take a step forward towards answering this question. We introduce an \textit{adversarially-perturbed spiked-covariance model} in which, each data point is assumed to follow a fixed stochastic distribution with a non-zero spectral gap in the covariance matrix, but is then perturbed with some adversarial vector. This model is a natural extension of a well studied standard \textit{stochastic} setting that allows for non-stationary (adversarial) patterns to arise in the data and hence, might serve as a significantly better approximation for real-world data-streams. We show that in an interesting regime of parameters, when the non-convex online gradient ascent algorithm is initialized with a “warm-start" vector, it provably minimizes the regret with high probability. We further discuss the possibility of computing such a “warm-start" vector, and also the use of regularization to obtain fast regret rates. Our theoretical findings are supported by empirical experiments on both synthetic and real-world data.