Optimistic Planning by Regularized Dynamic Programming

Antoine Moulin, Gergely Neu
Proceedings of the 40th International Conference on Machine Learning, PMLR 202:25337-25357, 2023.

Abstract

We propose a new method for optimistic planning in infinite-horizon discounted Markov decision processes based on the idea of adding regularization to the updates of an otherwise standard approximate value iteration procedure. This technique allows us to avoid contraction and monotonicity arguments typically required by existing analyses of approximate dynamic programming methods, and in particular to use approximate transition functions estimated via least-squares procedures in MDPs with linear function approximation. We use our method to recover known guarantees in tabular MDPs and to provide a computationally efficient algorithm for learning near-optimal policies in discounted linear mixture MDPs from a single stream of experience, and show it achieves near-optimal statistical guarantees.

Cite this Paper


BibTeX
@InProceedings{pmlr-v202-moulin23a, title = {Optimistic Planning by Regularized Dynamic Programming}, author = {Moulin, Antoine and Neu, Gergely}, booktitle = {Proceedings of the 40th International Conference on Machine Learning}, pages = {25337--25357}, year = {2023}, editor = {Krause, Andreas and Brunskill, Emma and Cho, Kyunghyun and Engelhardt, Barbara and Sabato, Sivan and Scarlett, Jonathan}, volume = {202}, series = {Proceedings of Machine Learning Research}, month = {23--29 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v202/moulin23a/moulin23a.pdf}, url = {https://proceedings.mlr.press/v202/moulin23a.html}, abstract = {We propose a new method for optimistic planning in infinite-horizon discounted Markov decision processes based on the idea of adding regularization to the updates of an otherwise standard approximate value iteration procedure. This technique allows us to avoid contraction and monotonicity arguments typically required by existing analyses of approximate dynamic programming methods, and in particular to use approximate transition functions estimated via least-squares procedures in MDPs with linear function approximation. We use our method to recover known guarantees in tabular MDPs and to provide a computationally efficient algorithm for learning near-optimal policies in discounted linear mixture MDPs from a single stream of experience, and show it achieves near-optimal statistical guarantees.} }
Endnote
%0 Conference Paper %T Optimistic Planning by Regularized Dynamic Programming %A Antoine Moulin %A Gergely Neu %B Proceedings of the 40th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2023 %E Andreas Krause %E Emma Brunskill %E Kyunghyun Cho %E Barbara Engelhardt %E Sivan Sabato %E Jonathan Scarlett %F pmlr-v202-moulin23a %I PMLR %P 25337--25357 %U https://proceedings.mlr.press/v202/moulin23a.html %V 202 %X We propose a new method for optimistic planning in infinite-horizon discounted Markov decision processes based on the idea of adding regularization to the updates of an otherwise standard approximate value iteration procedure. This technique allows us to avoid contraction and monotonicity arguments typically required by existing analyses of approximate dynamic programming methods, and in particular to use approximate transition functions estimated via least-squares procedures in MDPs with linear function approximation. We use our method to recover known guarantees in tabular MDPs and to provide a computationally efficient algorithm for learning near-optimal policies in discounted linear mixture MDPs from a single stream of experience, and show it achieves near-optimal statistical guarantees.
APA
Moulin, A. & Neu, G.. (2023). Optimistic Planning by Regularized Dynamic Programming. Proceedings of the 40th International Conference on Machine Learning, in Proceedings of Machine Learning Research 202:25337-25357 Available from https://proceedings.mlr.press/v202/moulin23a.html.

Related Material