Variance-Reduced Conservative Policy Iteration

Naman Agarwal; Brian Bullins; Karan Singh

Variance-Reduced Conservative Policy Iteration

Naman Agarwal, Brian Bullins, Karan Singh

Proceedings of The 34th International Conference on Algorithmic Learning Theory, PMLR 201:3-33, 2023.

Abstract

We study the sample complexity of reducing reinforcement learning to a sequence of empirical risk minimization problems over the policy space. Such reductions-based algorithms exhibit local convergence in the function space, as opposed to the parameter space for policy gradient algorithms, and thus are unaffected by the possibly non-linear or discontinuous parameterization of the policy class. We propose a variance-reduced variant of Conservative Policy Iteration that improves the sample complexity of producing a

$\varepsilon$ -functional local optimum from

$O(\varepsilon^{-4})$ to

$O(\varepsilon^{-3})$ . Under state-coverage and policy-completeness assumptions, the algorithm enjoys

$\varepsilon$ -global optimality after sampling

$O(\varepsilon^{-2})$ times, improving upon the previously established

$O(\varepsilon^{-3})$ sample requirement.

Cite this Paper

BibTeX


@InProceedings{pmlr-v201-agarwal23a,
  title = 	 {Variance-Reduced Conservative Policy Iteration},
  author =       {Agarwal, Naman and Bullins, Brian and Singh, Karan},
  booktitle = 	 {Proceedings of The 34th International Conference on Algorithmic Learning Theory},
  pages = 	 {3--33},
  year = 	 {2023},
  editor = 	 {Agrawal, Shipra and Orabona, Francesco},
  volume = 	 {201},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {20 Feb--23 Feb},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v201/agarwal23a/agarwal23a.pdf},
  url = 	 {https://proceedings.mlr.press/v201/agarwal23a.html},
  abstract = 	 {We study the sample complexity of reducing reinforcement learning to a sequence of empirical risk minimization problems over the policy space. Such reductions-based algorithms exhibit local convergence in the function space, as opposed to the parameter space for policy gradient algorithms, and thus are unaffected by the possibly non-linear or discontinuous parameterization of the policy class. We propose a variance-reduced variant of Conservative Policy Iteration that improves the sample complexity of producing a $\varepsilon$-functional local optimum from $O(\varepsilon^{-4})$ to $O(\varepsilon^{-3})$. Under state-coverage and policy-completeness assumptions, the algorithm enjoys $\varepsilon$-global optimality after sampling $O(\varepsilon^{-2})$ times, improving upon the previously established $O(\varepsilon^{-3})$ sample requirement.}
}

Endnote

%0 Conference Paper
%T Variance-Reduced Conservative Policy Iteration
%A Naman Agarwal
%A Brian Bullins
%A Karan Singh
%B Proceedings of The 34th International Conference on Algorithmic Learning Theory
%C Proceedings of Machine Learning Research
%D 2023
%E Shipra Agrawal
%E Francesco Orabona	
%F pmlr-v201-agarwal23a
%I PMLR
%P 3--33
%U https://proceedings.mlr.press/v201/agarwal23a.html
%V 201
%X We study the sample complexity of reducing reinforcement learning to a sequence of empirical risk minimization problems over the policy space. Such reductions-based algorithms exhibit local convergence in the function space, as opposed to the parameter space for policy gradient algorithms, and thus are unaffected by the possibly non-linear or discontinuous parameterization of the policy class. We propose a variance-reduced variant of Conservative Policy Iteration that improves the sample complexity of producing a $\varepsilon$-functional local optimum from $O(\varepsilon^{-4})$ to $O(\varepsilon^{-3})$. Under state-coverage and policy-completeness assumptions, the algorithm enjoys $\varepsilon$-global optimality after sampling $O(\varepsilon^{-2})$ times, improving upon the previously established $O(\varepsilon^{-3})$ sample requirement.

APA


Agarwal, N., Bullins, B. & Singh, K.. (2023). Variance-Reduced Conservative Policy Iteration. Proceedings of The 34th International Conference on Algorithmic Learning Theory, in Proceedings of Machine Learning Research 201:3-33 Available from https://proceedings.mlr.press/v201/agarwal23a.html.

Related Material

Download PDF