Confident Least Square Value Iteration with Local Access to a Simulator

Botao Hao, Nevena Lazic, Dong Yin, Yasin Abbasi-Yadkori, Csaba Szepesvari
Proceedings of The 25th International Conference on Artificial Intelligence and Statistics, PMLR 151:2420-2435, 2022.

Abstract

Learning with simulators is ubiquitous in mod-ern reinforcement learning (RL). The simulatorcan either correspond to a simplified version ofthe real environment (such as a physics simulation of a robot arm) or to the environment itself (such as in games like Atari and Go). Among algorithms that are provably sample-efficient in this setting, most make the unrealistic assumption that all possible environment states are known before learning begins, or perform global optimistic planning which is computationally inefficient. In this work, we focus on simulation-based RL under a more realistic local access protocol, where the state space is unknown and the simulator can only be queried at states that have previously been observed (initial states and those returned by previous queries). We propose an algorithm named CONFIDENT-LSVI based on the template of least-square value iteration. CONFIDENT-LSVI incrementally builds a coreset of important states and uses the simulator to revisit them. Assuming that the linear function class has low approximation error under the Bell-man optimality operator (a.k.a. low inherent Bell-man error), we bound the algorithm performance in terms of this error, and show that it is query-and computationally-efficient.

Cite this Paper


BibTeX
@InProceedings{pmlr-v151-hao22a, title = { Confident Least Square Value Iteration with Local Access to a Simulator }, author = {Hao, Botao and Lazic, Nevena and Yin, Dong and Abbasi-Yadkori, Yasin and Szepesvari, Csaba}, booktitle = {Proceedings of The 25th International Conference on Artificial Intelligence and Statistics}, pages = {2420--2435}, year = {2022}, editor = {Camps-Valls, Gustau and Ruiz, Francisco J. R. and Valera, Isabel}, volume = {151}, series = {Proceedings of Machine Learning Research}, month = {28--30 Mar}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v151/hao22a/hao22a.pdf}, url = {https://proceedings.mlr.press/v151/hao22a.html}, abstract = { Learning with simulators is ubiquitous in mod-ern reinforcement learning (RL). The simulatorcan either correspond to a simplified version ofthe real environment (such as a physics simulation of a robot arm) or to the environment itself (such as in games like Atari and Go). Among algorithms that are provably sample-efficient in this setting, most make the unrealistic assumption that all possible environment states are known before learning begins, or perform global optimistic planning which is computationally inefficient. In this work, we focus on simulation-based RL under a more realistic local access protocol, where the state space is unknown and the simulator can only be queried at states that have previously been observed (initial states and those returned by previous queries). We propose an algorithm named CONFIDENT-LSVI based on the template of least-square value iteration. CONFIDENT-LSVI incrementally builds a coreset of important states and uses the simulator to revisit them. Assuming that the linear function class has low approximation error under the Bell-man optimality operator (a.k.a. low inherent Bell-man error), we bound the algorithm performance in terms of this error, and show that it is query-and computationally-efficient. } }
Endnote
%0 Conference Paper %T Confident Least Square Value Iteration with Local Access to a Simulator %A Botao Hao %A Nevena Lazic %A Dong Yin %A Yasin Abbasi-Yadkori %A Csaba Szepesvari %B Proceedings of The 25th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2022 %E Gustau Camps-Valls %E Francisco J. R. Ruiz %E Isabel Valera %F pmlr-v151-hao22a %I PMLR %P 2420--2435 %U https://proceedings.mlr.press/v151/hao22a.html %V 151 %X Learning with simulators is ubiquitous in mod-ern reinforcement learning (RL). The simulatorcan either correspond to a simplified version ofthe real environment (such as a physics simulation of a robot arm) or to the environment itself (such as in games like Atari and Go). Among algorithms that are provably sample-efficient in this setting, most make the unrealistic assumption that all possible environment states are known before learning begins, or perform global optimistic planning which is computationally inefficient. In this work, we focus on simulation-based RL under a more realistic local access protocol, where the state space is unknown and the simulator can only be queried at states that have previously been observed (initial states and those returned by previous queries). We propose an algorithm named CONFIDENT-LSVI based on the template of least-square value iteration. CONFIDENT-LSVI incrementally builds a coreset of important states and uses the simulator to revisit them. Assuming that the linear function class has low approximation error under the Bell-man optimality operator (a.k.a. low inherent Bell-man error), we bound the algorithm performance in terms of this error, and show that it is query-and computationally-efficient.
APA
Hao, B., Lazic, N., Yin, D., Abbasi-Yadkori, Y. & Szepesvari, C.. (2022). Confident Least Square Value Iteration with Local Access to a Simulator . Proceedings of The 25th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 151:2420-2435 Available from https://proceedings.mlr.press/v151/hao22a.html.

Related Material