Efficient local planning with linear function approximation

Dong Yin; Botao Hao; Yasin Abbasi-Yadkori; Nevena Lazić; Csaba Szepesvári

Efficient local planning with linear function approximation

Dong Yin, Botao Hao, Yasin Abbasi-Yadkori, Nevena Lazić, Csaba Szepesvári

Proceedings of The 33rd International Conference on Algorithmic Learning Theory, PMLR 167:1165-1192, 2022.

Abstract

We study query and computationally efficient planning algorithms for discounted Markov decision processes (MDPs) with linear function approximation and a simulator. The agent is assumed to have local access to the simulator, meaning that the simulator can be queried only at states that have been encountered in previous steps. We propose two new algorithms for this setting, which we call confident Monte Carlo least-squares policy iteration (Confident MC-LSPI), and confident Monte Carlo Politex (Confident MC-Politex), respectively. The main novelty in our algorithms is that they gradually build a set of state-action pairs (“core set”) with which it can control the extrapolation errors. We show that our algorithms have polynomial query and computational cost in the dimension of the features, the effective planning horizon and the targeted sub-optimality, while the cost remains independent of the size of the state space. An interesting technical contribution of our work is the introduction of a novel proof technique that makes use of a virtual policy iteration algorithm. We use this method to leverage existing results on approximate policy iteration with

$\ell_\infty$ -bounded error to show that our algorithm can learn the optimal policy for the given initial state even only with local access to the simulator. We believe that this technique can be extended to broader settings beyond this work.

Cite this Paper

BibTeX


@InProceedings{pmlr-v167-yin22a,
  title = 	 {Efficient local planning with linear function approximation},
  author =       {Yin, Dong and Hao, Botao and Abbasi-Yadkori, Yasin and Lazi{\'c}, Nevena and Szepesv{\'a}ri, Csaba},
  booktitle = 	 {Proceedings of The 33rd International Conference on Algorithmic Learning Theory},
  pages = 	 {1165--1192},
  year = 	 {2022},
  editor = 	 {Dasgupta, Sanjoy and Haghtalab, Nika},
  volume = 	 {167},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {29 Mar--01 Apr},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v167/yin22a/yin22a.pdf},
  url = 	 {https://proceedings.mlr.press/v167/yin22a.html},
  abstract = 	 {We study query and computationally efficient planning algorithms for discounted Markov decision processes (MDPs) with linear function approximation and a simulator. The agent is assumed to have local access to the simulator, meaning that the simulator can be queried only at states that have been encountered in previous steps. We propose two new algorithms for this setting, which we call confident Monte Carlo least-squares policy iteration (Confident MC-LSPI), and confident Monte Carlo Politex (Confident MC-Politex), respectively. The main novelty in our algorithms is that they gradually build a set of state-action pairs (“core set”) with which it can control the extrapolation errors. We show that our algorithms have polynomial query and computational cost in the dimension of the features, the effective planning horizon and the targeted sub-optimality, while the cost remains independent of the size of the state space. An interesting technical contribution of our work is the introduction of a novel proof technique that makes use of a virtual policy iteration algorithm. We use this method to leverage existing results on  approximate policy iteration with $\ell_\infty$-bounded error to show that our algorithm can learn the optimal policy for the given initial state even only with local access to the simulator. We believe that this technique can be extended to broader settings beyond this work.}
}

Endnote

%0 Conference Paper
%T Efficient local planning with linear function approximation
%A Dong Yin
%A Botao Hao
%A Yasin Abbasi-Yadkori
%A Nevena Lazić
%A Csaba Szepesvári
%B Proceedings of The 33rd International Conference on Algorithmic Learning Theory
%C Proceedings of Machine Learning Research
%D 2022
%E Sanjoy Dasgupta
%E Nika Haghtalab	
%F pmlr-v167-yin22a
%I PMLR
%P 1165--1192
%U https://proceedings.mlr.press/v167/yin22a.html
%V 167
%X We study query and computationally efficient planning algorithms for discounted Markov decision processes (MDPs) with linear function approximation and a simulator. The agent is assumed to have local access to the simulator, meaning that the simulator can be queried only at states that have been encountered in previous steps. We propose two new algorithms for this setting, which we call confident Monte Carlo least-squares policy iteration (Confident MC-LSPI), and confident Monte Carlo Politex (Confident MC-Politex), respectively. The main novelty in our algorithms is that they gradually build a set of state-action pairs (“core set”) with which it can control the extrapolation errors. We show that our algorithms have polynomial query and computational cost in the dimension of the features, the effective planning horizon and the targeted sub-optimality, while the cost remains independent of the size of the state space. An interesting technical contribution of our work is the introduction of a novel proof technique that makes use of a virtual policy iteration algorithm. We use this method to leverage existing results on  approximate policy iteration with $\ell_\infty$-bounded error to show that our algorithm can learn the optimal policy for the given initial state even only with local access to the simulator. We believe that this technique can be extended to broader settings beyond this work.

APA


Yin, D., Hao, B., Abbasi-Yadkori, Y., Lazić, N. & Szepesvári, C.. (2022). Efficient local planning with linear function approximation. Proceedings of The 33rd International Conference on Algorithmic Learning Theory, in Proceedings of Machine Learning Research 167:1165-1192 Available from https://proceedings.mlr.press/v167/yin22a.html.

Related Material

Download PDF