Efficient Global Planning in Large MDPs via Stochastic Primal-Dual Optimization

Gergely Neu, Nneka Okolo
Proceedings of The 34th International Conference on Algorithmic Learning Theory, PMLR 201:1101-1123, 2023.

Abstract

We propose a new stochastic primal-dual optimization algorithm for planning in a large discounted Markov decision process with a generative model and linear function approximation. Assuming that the feature map approximately satisfies standard realizability and Bellman-closedness conditions and also that the feature vectors of all state-action pairs are representable as convex combinations of a small core set of state-action pairs, we show that our method outputs a near-optimal policy after a polynomial number of queries to the generative model. Our method is computationally efficient and comes with the major advantage that it outputs a single softmax policy that is compactly represented by a low-dimensional parameter vector, and does not need to execute computationally expensive local planning subroutines in runtime.

Cite this Paper


BibTeX
@InProceedings{pmlr-v201-neu23a, title = {Efficient Global Planning in Large MDPs via Stochastic Primal-Dual Optimization}, author = {Neu, Gergely and Okolo, Nneka}, booktitle = {Proceedings of The 34th International Conference on Algorithmic Learning Theory}, pages = {1101--1123}, year = {2023}, editor = {Agrawal, Shipra and Orabona, Francesco}, volume = {201}, series = {Proceedings of Machine Learning Research}, month = {20 Feb--23 Feb}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v201/neu23a/neu23a.pdf}, url = {https://proceedings.mlr.press/v201/neu23a.html}, abstract = {We propose a new stochastic primal-dual optimization algorithm for planning in a large discounted Markov decision process with a generative model and linear function approximation. Assuming that the feature map approximately satisfies standard realizability and Bellman-closedness conditions and also that the feature vectors of all state-action pairs are representable as convex combinations of a small core set of state-action pairs, we show that our method outputs a near-optimal policy after a polynomial number of queries to the generative model. Our method is computationally efficient and comes with the major advantage that it outputs a single softmax policy that is compactly represented by a low-dimensional parameter vector, and does not need to execute computationally expensive local planning subroutines in runtime.} }
Endnote
%0 Conference Paper %T Efficient Global Planning in Large MDPs via Stochastic Primal-Dual Optimization %A Gergely Neu %A Nneka Okolo %B Proceedings of The 34th International Conference on Algorithmic Learning Theory %C Proceedings of Machine Learning Research %D 2023 %E Shipra Agrawal %E Francesco Orabona %F pmlr-v201-neu23a %I PMLR %P 1101--1123 %U https://proceedings.mlr.press/v201/neu23a.html %V 201 %X We propose a new stochastic primal-dual optimization algorithm for planning in a large discounted Markov decision process with a generative model and linear function approximation. Assuming that the feature map approximately satisfies standard realizability and Bellman-closedness conditions and also that the feature vectors of all state-action pairs are representable as convex combinations of a small core set of state-action pairs, we show that our method outputs a near-optimal policy after a polynomial number of queries to the generative model. Our method is computationally efficient and comes with the major advantage that it outputs a single softmax policy that is compactly represented by a low-dimensional parameter vector, and does not need to execute computationally expensive local planning subroutines in runtime.
APA
Neu, G. & Okolo, N.. (2023). Efficient Global Planning in Large MDPs via Stochastic Primal-Dual Optimization. Proceedings of The 34th International Conference on Algorithmic Learning Theory, in Proceedings of Machine Learning Research 201:1101-1123 Available from https://proceedings.mlr.press/v201/neu23a.html.

Related Material