Online Learning with Non-Convex Losses and Non-Stationary Regret

Xiand Gao; Xiaobo Li; Shuzhong Zhang

Online Learning with Non-Convex Losses and Non-Stationary Regret

Xiand Gao, Xiaobo Li, Shuzhong Zhang

Proceedings of the Twenty-First International Conference on Artificial Intelligence and Statistics, PMLR 84:235-243, 2018.

Abstract

In this paper, we consider online learning with non-convex loss functions. Similar to Besbes et al. [2015] we apply non-stationary regret as the performance metric. In particular, we study the regret bounds under different assumptions on the information available regarding the loss functions. When the gradient of the loss function at the decision point is available, we propose an online normalized gradient descent algorithm (ONGD) to solve the online learning problem. In another situation, when only the value of the loss function is available, we propose a bandit online normalized gradient descent algorithm (BONGD). Under a condition to be called weak pseudo-convexity (WPC), we show that both algorithms achieve a cumulative regret bound of O($\sqrt{T+V_T T}$), where $V_T$ is the total temporal variations of the loss functions, thus establishing a sublinear regret bound for online learning with non-convex loss functions and non-stationary regret measure.

Cite this Paper

BibTeX

@InProceedings{pmlr-v84-gao18a,
  title = 	 {Online Learning with Non-Convex Losses and Non-Stationary Regret},
  author = 	 {Gao, Xiand and Li, Xiaobo and Zhang, Shuzhong},
  booktitle = 	 {Proceedings of the Twenty-First International Conference on Artificial Intelligence and Statistics},
  pages = 	 {235--243},
  year = 	 {2018},
  editor = 	 {Storkey, Amos and Perez-Cruz, Fernando},
  volume = 	 {84},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {09--11 Apr},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v84/gao18a/gao18a.pdf},
  url = 	 {https://proceedings.mlr.press/v84/gao18a.html},
  abstract = 	 {In this paper, we consider online learning with non-convex loss functions. Similar to Besbes et al. [2015] we apply non-stationary regret as the performance metric. In particular, we study the regret bounds under different assumptions on the information available regarding the loss functions. When the gradient of the loss function at the decision point is available, we propose an online normalized gradient descent algorithm (ONGD) to solve the online learning problem. In another situation, when only the value of the loss function is available, we propose a bandit online normalized gradient descent algorithm (BONGD). Under a condition to be called weak pseudo-convexity (WPC), we show that both algorithms achieve a cumulative regret bound of O($\sqrt{T+V_T T}$), where $V_T$ is the total temporal variations of the loss functions, thus establishing a sublinear regret bound for online learning with non-convex loss functions and non-stationary regret measure.}
}

Endnote

%0 Conference Paper
%T Online Learning with Non-Convex Losses and Non-Stationary Regret
%A Xiand Gao
%A Xiaobo Li
%A Shuzhong Zhang
%B Proceedings of the Twenty-First International Conference on Artificial Intelligence and Statistics
%C Proceedings of Machine Learning Research
%D 2018
%E Amos Storkey
%E Fernando Perez-Cruz	
%F pmlr-v84-gao18a
%I PMLR
%P 235--243
%U https://proceedings.mlr.press/v84/gao18a.html
%V 84
%X In this paper, we consider online learning with non-convex loss functions. Similar to Besbes et al. [2015] we apply non-stationary regret as the performance metric. In particular, we study the regret bounds under different assumptions on the information available regarding the loss functions. When the gradient of the loss function at the decision point is available, we propose an online normalized gradient descent algorithm (ONGD) to solve the online learning problem. In another situation, when only the value of the loss function is available, we propose a bandit online normalized gradient descent algorithm (BONGD). Under a condition to be called weak pseudo-convexity (WPC), we show that both algorithms achieve a cumulative regret bound of O($\sqrt{T+V_T T}$), where $V_T$ is the total temporal variations of the loss functions, thus establishing a sublinear regret bound for online learning with non-convex loss functions and non-stationary regret measure.

APA

Gao, X., Li, X. & Zhang, S.. (2018). Online Learning with Non-Convex Losses and Non-Stationary Regret. Proceedings of the Twenty-First International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 84:235-243 Available from https://proceedings.mlr.press/v84/gao18a.html.

Online Learning with Non-Convex Losses and Non-Stationary Regret

Abstract

Cite this Paper

Related Material