Derivative-Free Methods for Policy Optimization: Guarantees for Linear Quadratic Systems

Dhruv Malik; Ashwin Pananjady; Kush Bhatia; Koulik Khamaru; Peter Bartlett; Martin Wainwright

Derivative-Free Methods for Policy Optimization: Guarantees for Linear Quadratic Systems

Dhruv Malik, Ashwin Pananjady, Kush Bhatia, Koulik Khamaru, Peter Bartlett, Martin Wainwright

Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics, PMLR 89:2916-2925, 2019.

Abstract

We study derivative-free methods for policy optimization over the class of linear policies. We focus on characterizing the convergence rate of a canonical stochastic, two-point, derivative-free method for linear-quadratic systems in which the initial state of the system is drawn at random. In particular, we show that for problems with effective dimension $D$, such a method converges to an $\epsilon$-approximate solution within $\widetilde{\mathcal{O}}(D/\epsilon)$ steps, with multiplicative pre-factors that are explicit lower-order polynomial terms in the curvature parameters of the problem. Along the way, we also derive stochastic zero-order rates for a class of non-convex optimization problems.

Cite this Paper

BibTeX

@InProceedings{pmlr-v89-malik19a,
  title = 	 {Derivative-Free Methods for Policy Optimization: Guarantees for Linear Quadratic Systems},
  author =       {Malik, Dhruv and Pananjady, Ashwin and Bhatia, Kush and Khamaru, Koulik and Bartlett, Peter and Wainwright, Martin},
  booktitle = 	 {Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics},
  pages = 	 {2916--2925},
  year = 	 {2019},
  editor = 	 {Chaudhuri, Kamalika and Sugiyama, Masashi},
  volume = 	 {89},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {16--18 Apr},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v89/malik19a/malik19a.pdf},
  url = 	 {https://proceedings.mlr.press/v89/malik19a.html},
  abstract = 	 {We study derivative-free methods for policy optimization over the class of linear policies. We focus on characterizing the convergence rate of a canonical stochastic, two-point, derivative-free method for linear-quadratic systems in which the initial state of the system is drawn at random. In particular, we show that for problems with effective dimension $D$, such a method converges to an $\epsilon$-approximate solution within $\widetilde{\mathcal{O}}(D/\epsilon)$ steps, with multiplicative pre-factors that are explicit lower-order polynomial terms in the curvature parameters of the problem. Along the way, we also derive stochastic zero-order rates for a class of non-convex optimization problems.}
}

Endnote

%0 Conference Paper
%T Derivative-Free Methods for Policy Optimization: Guarantees for Linear Quadratic Systems
%A Dhruv Malik
%A Ashwin Pananjady
%A Kush Bhatia
%A Koulik Khamaru
%A Peter Bartlett
%A Martin Wainwright
%B Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics
%C Proceedings of Machine Learning Research
%D 2019
%E Kamalika Chaudhuri
%E Masashi Sugiyama	
%F pmlr-v89-malik19a
%I PMLR
%P 2916--2925
%U https://proceedings.mlr.press/v89/malik19a.html
%V 89
%X We study derivative-free methods for policy optimization over the class of linear policies. We focus on characterizing the convergence rate of a canonical stochastic, two-point, derivative-free method for linear-quadratic systems in which the initial state of the system is drawn at random. In particular, we show that for problems with effective dimension $D$, such a method converges to an $\epsilon$-approximate solution within $\widetilde{\mathcal{O}}(D/\epsilon)$ steps, with multiplicative pre-factors that are explicit lower-order polynomial terms in the curvature parameters of the problem. Along the way, we also derive stochastic zero-order rates for a class of non-convex optimization problems.

APA

Malik, D., Pananjady, A., Bhatia, K., Khamaru, K., Bartlett, P. & Wainwright, M.. (2019). Derivative-Free Methods for Policy Optimization: Guarantees for Linear Quadratic Systems. Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 89:2916-2925 Available from https://proceedings.mlr.press/v89/malik19a.html.

Derivative-Free Methods for Policy Optimization: Guarantees for Linear Quadratic Systems

Abstract

Cite this Paper

Related Material