Variance-Reduced Conservative Policy Iteration

Naman Agarwal, Brian Bullins, Karan Singh
Proceedings of The 34th International Conference on Algorithmic Learning Theory, PMLR 201:3-33, 2023.

Abstract

We study the sample complexity of reducing reinforcement learning to a sequence of empirical risk minimization problems over the policy space. Such reductions-based algorithms exhibit local convergence in the function space, as opposed to the parameter space for policy gradient algorithms, and thus are unaffected by the possibly non-linear or discontinuous parameterization of the policy class. We propose a variance-reduced variant of Conservative Policy Iteration that improves the sample complexity of producing a $\varepsilon$-functional local optimum from $O(\varepsilon^{-4})$ to $O(\varepsilon^{-3})$. Under state-coverage and policy-completeness assumptions, the algorithm enjoys $\varepsilon$-global optimality after sampling $O(\varepsilon^{-2})$ times, improving upon the previously established $O(\varepsilon^{-3})$ sample requirement.

Cite this Paper


BibTeX
@InProceedings{pmlr-v201-agarwal23a, title = {Variance-Reduced Conservative Policy Iteration}, author = {Agarwal, Naman and Bullins, Brian and Singh, Karan}, booktitle = {Proceedings of The 34th International Conference on Algorithmic Learning Theory}, pages = {3--33}, year = {2023}, editor = {Agrawal, Shipra and Orabona, Francesco}, volume = {201}, series = {Proceedings of Machine Learning Research}, month = {20 Feb--23 Feb}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v201/agarwal23a/agarwal23a.pdf}, url = {https://proceedings.mlr.press/v201/agarwal23a.html}, abstract = {We study the sample complexity of reducing reinforcement learning to a sequence of empirical risk minimization problems over the policy space. Such reductions-based algorithms exhibit local convergence in the function space, as opposed to the parameter space for policy gradient algorithms, and thus are unaffected by the possibly non-linear or discontinuous parameterization of the policy class. We propose a variance-reduced variant of Conservative Policy Iteration that improves the sample complexity of producing a $\varepsilon$-functional local optimum from $O(\varepsilon^{-4})$ to $O(\varepsilon^{-3})$. Under state-coverage and policy-completeness assumptions, the algorithm enjoys $\varepsilon$-global optimality after sampling $O(\varepsilon^{-2})$ times, improving upon the previously established $O(\varepsilon^{-3})$ sample requirement.} }
Endnote
%0 Conference Paper %T Variance-Reduced Conservative Policy Iteration %A Naman Agarwal %A Brian Bullins %A Karan Singh %B Proceedings of The 34th International Conference on Algorithmic Learning Theory %C Proceedings of Machine Learning Research %D 2023 %E Shipra Agrawal %E Francesco Orabona %F pmlr-v201-agarwal23a %I PMLR %P 3--33 %U https://proceedings.mlr.press/v201/agarwal23a.html %V 201 %X We study the sample complexity of reducing reinforcement learning to a sequence of empirical risk minimization problems over the policy space. Such reductions-based algorithms exhibit local convergence in the function space, as opposed to the parameter space for policy gradient algorithms, and thus are unaffected by the possibly non-linear or discontinuous parameterization of the policy class. We propose a variance-reduced variant of Conservative Policy Iteration that improves the sample complexity of producing a $\varepsilon$-functional local optimum from $O(\varepsilon^{-4})$ to $O(\varepsilon^{-3})$. Under state-coverage and policy-completeness assumptions, the algorithm enjoys $\varepsilon$-global optimality after sampling $O(\varepsilon^{-2})$ times, improving upon the previously established $O(\varepsilon^{-3})$ sample requirement.
APA
Agarwal, N., Bullins, B. & Singh, K.. (2023). Variance-Reduced Conservative Policy Iteration. Proceedings of The 34th International Conference on Algorithmic Learning Theory, in Proceedings of Machine Learning Research 201:3-33 Available from https://proceedings.mlr.press/v201/agarwal23a.html.

Related Material