Online Learning with NonConvex Losses and NonStationary Regret
[edit]
Proceedings of the TwentyFirst International Conference on Artificial Intelligence and Statistics, PMLR 84:235243, 2018.
Abstract
In this paper, we consider online learning with nonconvex loss functions. Similar to Besbes et al. [2015] we apply nonstationary regret as the performance metric. In particular, we study the regret bounds under different assumptions on the information available regarding the loss functions. When the gradient of the loss function at the decision point is available, we propose an online normalized gradient descent algorithm (ONGD) to solve the online learning problem. In another situation, when only the value of the loss function is available, we propose a bandit online normalized gradient descent algorithm (BONGD). Under a condition to be called weak pseudoconvexity (WPC), we show that both algorithms achieve a cumulative regret bound of O($\sqrt{T+V_T T}$), where $V_T$ is the total temporal variations of the loss functions, thus establishing a sublinear regret bound for online learning with nonconvex loss functions and nonstationary regret measure.
Related Material


