Empirical Risk Minimization for Stochastic Convex Optimization: $O(1/n)$- and $O(1/n^2)$-type of Risk Bounds

Lijun Zhang; Tianbao Yang; Rong Jin

Empirical Risk Minimization for Stochastic Convex Optimization: $O(1/n)$- and $O(1/n^2)$-type of Risk Bounds

Lijun Zhang, Tianbao Yang, Rong Jin

Proceedings of the 2017 Conference on Learning Theory, PMLR 65:1954-1979, 2017.

Abstract

Although there exist plentiful theories of empirical risk minimization (ERM) for supervised learning, current theoretical understandings of ERM for a related problem—stochastic convex optimization (SCO), are limited. In this work, we strengthen the realm of ERM for SCO by exploiting smoothness and strong convexity conditions to improve the risk bounds. First, we establish an $\widetilde{O}(d/n + \sqrt{F}_*/n)$ risk bound when the random function is nonnegative, convex and smooth, and the expected function is Lipschitz continuous, where $d$ is the dimensionality of the problem, $n$ is the number of samples, and $F_*$ is the minimal risk. Thus, when $F_*$ is small we obtain an $\widetilde{O}(d/n)$ risk bound, which is analogous to the $\widetilde{O}(1/n)$ optimistic rate of ERM for supervised learning. Second, if the objective function is also $λ$-strongly convex, we prove an $\widetilde{O}(d/n + κF_*/n )$ risk bound where $κ$ is the condition number, and improve it to $O(1/[λn^2] + κF_*/n)$ when $n=\widetilde{Ω}(κd)$. As a result, we obtain an $O(κ/n^2)$ risk bound under the condition that $n$ is large and $F_*$ is small, which to the best of our knowledge, is the first $O(1/n^2)$-type of risk bound of ERM. Third, we stress that the above results are established in a unified framework, which allows us to derive new risk bounds under weaker conditions, e.g., without convexity of the random function. Finally, we demonstrate that to achieve an $O(1/[λn^2] + κF_*/n)$ risk bound for supervised learning, the $\widetilde{Ω}(κd)$ requirement on $n$ can be replaced with $Ω(κ^2)$, which is dimensionality-independent.

Cite this Paper

BibTeX


@InProceedings{pmlr-v65-zhang17a,
  title = 	 {Empirical Risk Minimization for Stochastic Convex Optimization: ${O}(1/n)$- and ${O}(1/n^2)$-type of Risk Bounds},
  author = 	 {Zhang, Lijun and Yang, Tianbao and Jin, Rong},
  booktitle = 	 {Proceedings of the 2017 Conference on Learning Theory},
  pages = 	 {1954--1979},
  year = 	 {2017},
  editor = 	 {Kale, Satyen and Shamir, Ohad},
  volume = 	 {65},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {07--10 Jul},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v65/zhang17a/zhang17a.pdf},
  url = 	 {https://proceedings.mlr.press/v65/zhang17a.html},
  abstract = 	 {Although there exist plentiful theories of empirical risk minimization (ERM) for supervised learning, current theoretical understandings of ERM for a related problem—stochastic convex optimization (SCO), are limited. In this work, we strengthen the realm of ERM for SCO by exploiting smoothness and strong convexity conditions to improve the risk bounds. First, we establish an $\widetilde{O}(d/n + \sqrt{F}_*/n)$ risk bound when the random function is nonnegative, convex and smooth, and the expected function is Lipschitz continuous, where $d$ is the dimensionality of the problem, $n$ is the number of samples, and $F_*$ is the minimal risk. Thus, when $F_*$ is small we obtain an $\widetilde{O}(d/n)$ risk bound, which is analogous to the $\widetilde{O}(1/n)$ optimistic rate of ERM for supervised learning. Second, if the objective function is also $λ$-strongly convex, we prove an $\widetilde{O}(d/n  + κF_*/n )$ risk bound where $κ$ is the condition number, and improve it to $O(1/[λn^2] + κF_*/n)$ when $n=\widetilde{Ω}(κd)$. As a result, we obtain an $O(κ/n^2)$ risk bound under the condition that $n$ is large and $F_*$ is small, which to the best of our knowledge, is the first $O(1/n^2)$-type of risk bound of ERM. Third, we stress that the above results are established in a unified framework, which allows us to derive new risk bounds under weaker conditions, e.g., without convexity of the random function.  Finally, we demonstrate that to achieve an $O(1/[λn^2] + κF_*/n)$ risk bound for supervised learning,  the $\widetilde{Ω}(κd)$ requirement on $n$ can be replaced with $Ω(κ^2)$, which is dimensionality-independent.}
}

Endnote

%0 Conference Paper
%T Empirical Risk Minimization for Stochastic Convex Optimization: $O(1/n)$- and $O(1/n^2)$-type of Risk Bounds
%A Lijun Zhang
%A Tianbao Yang
%A Rong Jin
%B Proceedings of the 2017 Conference on Learning Theory
%C Proceedings of Machine Learning Research
%D 2017
%E Satyen Kale
%E Ohad Shamir	
%F pmlr-v65-zhang17a
%I PMLR
%P 1954--1979
%U https://proceedings.mlr.press/v65/zhang17a.html
%V 65
%X Although there exist plentiful theories of empirical risk minimization (ERM) for supervised learning, current theoretical understandings of ERM for a related problem—stochastic convex optimization (SCO), are limited. In this work, we strengthen the realm of ERM for SCO by exploiting smoothness and strong convexity conditions to improve the risk bounds. First, we establish an $\widetilde{O}(d/n + \sqrt{F}_*/n)$ risk bound when the random function is nonnegative, convex and smooth, and the expected function is Lipschitz continuous, where $d$ is the dimensionality of the problem, $n$ is the number of samples, and $F_*$ is the minimal risk. Thus, when $F_*$ is small we obtain an $\widetilde{O}(d/n)$ risk bound, which is analogous to the $\widetilde{O}(1/n)$ optimistic rate of ERM for supervised learning. Second, if the objective function is also $λ$-strongly convex, we prove an $\widetilde{O}(d/n  + κF_*/n )$ risk bound where $κ$ is the condition number, and improve it to $O(1/[λn^2] + κF_*/n)$ when $n=\widetilde{Ω}(κd)$. As a result, we obtain an $O(κ/n^2)$ risk bound under the condition that $n$ is large and $F_*$ is small, which to the best of our knowledge, is the first $O(1/n^2)$-type of risk bound of ERM. Third, we stress that the above results are established in a unified framework, which allows us to derive new risk bounds under weaker conditions, e.g., without convexity of the random function.  Finally, we demonstrate that to achieve an $O(1/[λn^2] + κF_*/n)$ risk bound for supervised learning,  the $\widetilde{Ω}(κd)$ requirement on $n$ can be replaced with $Ω(κ^2)$, which is dimensionality-independent.

APA


Zhang, L., Yang, T. & Jin, R.. (2017). Empirical Risk Minimization for Stochastic Convex Optimization: $O(1/n)$- and $O(1/n^2)$-type of Risk Bounds. Proceedings of the 2017 Conference on Learning Theory, in Proceedings of Machine Learning Research 65:1954-1979 Available from https://proceedings.mlr.press/v65/zhang17a.html.

Related Material

Download PDF