Scalable Bilinear Pi Learning Using State and Action Features

Yichen Chen, Lihong Li, Mengdi Wang
Proceedings of the 35th International Conference on Machine Learning, PMLR 80:834-843, 2018.

Abstract

Approximate linear programming (ALP) represents one of the major algorithmic families to solve large-scale Markov decision processes (MDP). In this work, we study a primal-dual formulation of the ALP, and develop a scalable, model-free algorithm called bilinear $\pi$ learning for reinforcement learning when a sampling oracle is provided. This algorithm enjoys a number of advantages. First, it adopts linear and bilinear models to represent the high-dimensional value function and state-action distributions, respectively, using given state and action features. Its run-time complexity depends on the number of features, not the size of the underlying MDPs. Second, it operates in a fully online fashion without having to store any sample, thus having minimal memory footprint. Third, we prove that it is sample-efficient, solving for the optimal policy to high precision with a sample complexity linear in the dimension of the parameter space.

Cite this Paper


BibTeX
@InProceedings{pmlr-v80-chen18e, title = {Scalable Bilinear Pi Learning Using State and Action Features}, author = {Chen, Yichen and Li, Lihong and Wang, Mengdi}, booktitle = {Proceedings of the 35th International Conference on Machine Learning}, pages = {834--843}, year = {2018}, editor = {Dy, Jennifer and Krause, Andreas}, volume = {80}, series = {Proceedings of Machine Learning Research}, month = {10--15 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v80/chen18e/chen18e.pdf}, url = {https://proceedings.mlr.press/v80/chen18e.html}, abstract = {Approximate linear programming (ALP) represents one of the major algorithmic families to solve large-scale Markov decision processes (MDP). In this work, we study a primal-dual formulation of the ALP, and develop a scalable, model-free algorithm called bilinear $\pi$ learning for reinforcement learning when a sampling oracle is provided. This algorithm enjoys a number of advantages. First, it adopts linear and bilinear models to represent the high-dimensional value function and state-action distributions, respectively, using given state and action features. Its run-time complexity depends on the number of features, not the size of the underlying MDPs. Second, it operates in a fully online fashion without having to store any sample, thus having minimal memory footprint. Third, we prove that it is sample-efficient, solving for the optimal policy to high precision with a sample complexity linear in the dimension of the parameter space.} }
Endnote
%0 Conference Paper %T Scalable Bilinear Pi Learning Using State and Action Features %A Yichen Chen %A Lihong Li %A Mengdi Wang %B Proceedings of the 35th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2018 %E Jennifer Dy %E Andreas Krause %F pmlr-v80-chen18e %I PMLR %P 834--843 %U https://proceedings.mlr.press/v80/chen18e.html %V 80 %X Approximate linear programming (ALP) represents one of the major algorithmic families to solve large-scale Markov decision processes (MDP). In this work, we study a primal-dual formulation of the ALP, and develop a scalable, model-free algorithm called bilinear $\pi$ learning for reinforcement learning when a sampling oracle is provided. This algorithm enjoys a number of advantages. First, it adopts linear and bilinear models to represent the high-dimensional value function and state-action distributions, respectively, using given state and action features. Its run-time complexity depends on the number of features, not the size of the underlying MDPs. Second, it operates in a fully online fashion without having to store any sample, thus having minimal memory footprint. Third, we prove that it is sample-efficient, solving for the optimal policy to high precision with a sample complexity linear in the dimension of the parameter space.
APA
Chen, Y., Li, L. & Wang, M.. (2018). Scalable Bilinear Pi Learning Using State and Action Features. Proceedings of the 35th International Conference on Machine Learning, in Proceedings of Machine Learning Research 80:834-843 Available from https://proceedings.mlr.press/v80/chen18e.html.

Related Material