On the Regret Minimization of Nonconvex Online Gradient Ascent for Online PCA
[edit]
Proceedings of the ThirtySecond Conference on Learning Theory, PMLR 99:13491373, 2019.
Abstract
In this paper we focus on the problem of Online Principal Component Analysis in the regret minimization framework. For this problem, all existing regret minimization algorithms for the fullyadversarial setting are based on a positive semidefinite convex relaxation, and hence require quadratic memory and SVD computation (either thin of full) on each iteration, which amounts to at least quadratic runtime per iteration. This is in stark contrast to a corresponding stochastic i.i.d. variant of the problem, which was studied extensively lately, and admits very efficient gradient ascent algorithms that work directly on the natural nonconvex formulation of the problem, and hence require only linear memory and linear runtime per iteration. This raises the question: \textit{can nonconvex online gradient ascent algorithms be shown to minimize regret in online adversarial settings?} In this paper we take a step forward towards answering this question. We introduce an \textit{adversariallyperturbed spikedcovariance model} in which, each data point is assumed to follow a fixed stochastic distribution with a nonzero spectral gap in the covariance matrix, but is then perturbed with some adversarial vector. This model is a natural extension of a well studied standard \textit{stochastic} setting that allows for nonstationary (adversarial) patterns to arise in the data and hence, might serve as a significantly better approximation for realworld datastreams. We show that in an interesting regime of parameters, when the nonconvex online gradient ascent algorithm is initialized with a “warmstart" vector, it provably minimizes the regret with high probability. We further discuss the possibility of computing such a “warmstart" vector, and also the use of regularization to obtain fast regret rates. Our theoretical findings are supported by empirical experiments on both synthetic and realworld data.
Related Material


