Scalable Bilinear Pi Learning Using State and Action Features

Yichen Chen; Lihong Li; Mengdi Wang

Scalable Bilinear Pi Learning Using State and Action Features

Yichen Chen, Lihong Li, Mengdi Wang

Proceedings of the 35th International Conference on Machine Learning, PMLR 80:834-843, 2018.

Abstract

Approximate linear programming (ALP) represents one of the major algorithmic families to solve large-scale Markov decision processes (MDP). In this work, we study a primal-dual formulation of the ALP, and develop a scalable, model-free algorithm called bilinear $\pi$ learning for reinforcement learning when a sampling oracle is provided. This algorithm enjoys a number of advantages. First, it adopts linear and bilinear models to represent the high-dimensional value function and state-action distributions, respectively, using given state and action features. Its run-time complexity depends on the number of features, not the size of the underlying MDPs. Second, it operates in a fully online fashion without having to store any sample, thus having minimal memory footprint. Third, we prove that it is sample-efficient, solving for the optimal policy to high precision with a sample complexity linear in the dimension of the parameter space.

Cite this Paper

BibTeX

@InProceedings{pmlr-v80-chen18e,
  title = 	 {Scalable Bilinear Pi Learning Using State and Action Features},
  author =       {Chen, Yichen and Li, Lihong and Wang, Mengdi},
  booktitle = 	 {Proceedings of the 35th International Conference on Machine Learning},
  pages = 	 {834--843},
  year = 	 {2018},
  editor = 	 {Dy, Jennifer and Krause, Andreas},
  volume = 	 {80},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {10--15 Jul},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v80/chen18e/chen18e.pdf},
  url = 	 {https://proceedings.mlr.press/v80/chen18e.html},
  abstract = 	 {Approximate linear programming (ALP) represents one of the major algorithmic families to solve large-scale Markov decision processes (MDP). In this work, we study a primal-dual formulation of the ALP, and develop a scalable, model-free algorithm called bilinear $\pi$ learning for reinforcement learning when a sampling oracle is provided. This algorithm enjoys a number of advantages. First, it adopts linear and bilinear models to represent the high-dimensional value function and state-action distributions, respectively, using given state and action features. Its run-time complexity depends on the number of features, not the size of the underlying MDPs. Second, it operates in a fully online fashion without having to store any sample, thus having minimal memory footprint. Third, we prove that it is sample-efficient, solving for the optimal policy to high precision with a sample complexity linear in the dimension of the parameter space.}
}

Endnote

%0 Conference Paper
%T Scalable Bilinear Pi Learning Using State and Action Features
%A Yichen Chen
%A Lihong Li
%A Mengdi Wang
%B Proceedings of the 35th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2018
%E Jennifer Dy
%E Andreas Krause	
%F pmlr-v80-chen18e
%I PMLR
%P 834--843
%U https://proceedings.mlr.press/v80/chen18e.html
%V 80
%X Approximate linear programming (ALP) represents one of the major algorithmic families to solve large-scale Markov decision processes (MDP). In this work, we study a primal-dual formulation of the ALP, and develop a scalable, model-free algorithm called bilinear $\pi$ learning for reinforcement learning when a sampling oracle is provided. This algorithm enjoys a number of advantages. First, it adopts linear and bilinear models to represent the high-dimensional value function and state-action distributions, respectively, using given state and action features. Its run-time complexity depends on the number of features, not the size of the underlying MDPs. Second, it operates in a fully online fashion without having to store any sample, thus having minimal memory footprint. Third, we prove that it is sample-efficient, solving for the optimal policy to high precision with a sample complexity linear in the dimension of the parameter space.

APA

Chen, Y., Li, L. & Wang, M.. (2018). Scalable Bilinear Pi Learning Using State and Action Features. Proceedings of the 35th International Conference on Machine Learning, in Proceedings of Machine Learning Research 80:834-843 Available from https://proceedings.mlr.press/v80/chen18e.html.

Scalable Bilinear Pi Learning Using State and Action Features

Abstract

Cite this Paper

Related Material