Online Model Selection for Reinforcement Learning with Function Approximation

Jonathan Lee; Aldo Pacchiano; Vidya Muthukumar; Weihao Kong; Emma Brunskill

Online Model Selection for Reinforcement Learning with Function Approximation

Jonathan Lee, Aldo Pacchiano, Vidya Muthukumar, Weihao Kong, Emma Brunskill

Proceedings of The 24th International Conference on Artificial Intelligence and Statistics, PMLR 130:3340-3348, 2021.

Abstract

Deep reinforcement learning has achieved impressive successes yet often requires a very large amount of interaction data. This result is perhaps unsurprising, as using complicated function approximation often requires more data to fit, and early theoretical results on linear Markov decision processes provide regret bounds that scale with the dimension of the linear approximation. Ideally, we would like to automatically identify the minimal dimension of the approximation that is sufficient to encode an optimal policy. Towards this end, we consider the problem of model selection in RL with function approximation, given a set of candidate RL algorithms with known regret guarantees. The learner’s goal is to adapt to the complexity of the optimal algorithm without knowing it a priori. We present a meta-algorithm that successively rejects increasingly complex models using a simple statistical test. Given at least one candidate that satisfies realizability, we prove the meta-algorithm adapts to the optimal complexity with regret that is only marginally suboptimal in the number of episodes and number of candidate algorithms. The dimension and horizon dependencies remain optimal with respect to the best candidate, and our meta-algorithmic approach is flexible to incorporate multiple candidate algorithms and models. Finally, we show that the meta-algorithm automatically admits significantly improved instance-dependent regret bounds that depend on the gaps between the maximal values attainable by the candidates.

Cite this Paper

BibTeX

@InProceedings{pmlr-v130-lee21d,
  title = 	 { Online Model Selection for Reinforcement Learning with Function Approximation },
  author =       {Lee, Jonathan and Pacchiano, Aldo and Muthukumar, Vidya and Kong, Weihao and Brunskill, Emma},
  booktitle = 	 {Proceedings of The 24th International Conference on Artificial Intelligence and Statistics},
  pages = 	 {3340--3348},
  year = 	 {2021},
  editor = 	 {Banerjee, Arindam and Fukumizu, Kenji},
  volume = 	 {130},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {13--15 Apr},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v130/lee21d/lee21d.pdf},
  url = 	 {https://proceedings.mlr.press/v130/lee21d.html},
  abstract = 	 { Deep reinforcement learning has achieved impressive successes yet often requires a very large amount of interaction data. This result is perhaps unsurprising, as using complicated function approximation often requires more data to fit, and early theoretical results on linear Markov decision processes provide regret bounds that scale with the dimension of the linear approximation. Ideally, we would like to automatically identify the minimal dimension of the approximation that is sufficient to encode an optimal policy. Towards this end, we consider the problem of model selection in RL with function approximation, given a set of candidate RL algorithms with known regret guarantees. The learner’s goal is to adapt to the complexity of the optimal algorithm without knowing it a priori. We present a meta-algorithm that successively rejects increasingly complex models using a simple statistical test. Given at least one candidate that satisfies realizability, we prove the meta-algorithm adapts to the optimal complexity with regret that is only marginally suboptimal in the number of episodes and number of candidate algorithms. The dimension and horizon dependencies remain optimal with respect to the best candidate, and our meta-algorithmic approach is flexible to incorporate multiple candidate algorithms and models. Finally, we show that the meta-algorithm automatically admits significantly improved instance-dependent regret bounds that depend on the gaps between the maximal values attainable by the candidates. }
}

Endnote

%0 Conference Paper
%T  Online Model Selection for Reinforcement Learning with Function Approximation 
%A Jonathan Lee
%A Aldo Pacchiano
%A Vidya Muthukumar
%A Weihao Kong
%A Emma Brunskill
%B Proceedings of The 24th International Conference on Artificial Intelligence and Statistics
%C Proceedings of Machine Learning Research
%D 2021
%E Arindam Banerjee
%E Kenji Fukumizu	
%F pmlr-v130-lee21d
%I PMLR
%P 3340--3348
%U https://proceedings.mlr.press/v130/lee21d.html
%V 130
%X  Deep reinforcement learning has achieved impressive successes yet often requires a very large amount of interaction data. This result is perhaps unsurprising, as using complicated function approximation often requires more data to fit, and early theoretical results on linear Markov decision processes provide regret bounds that scale with the dimension of the linear approximation. Ideally, we would like to automatically identify the minimal dimension of the approximation that is sufficient to encode an optimal policy. Towards this end, we consider the problem of model selection in RL with function approximation, given a set of candidate RL algorithms with known regret guarantees. The learner’s goal is to adapt to the complexity of the optimal algorithm without knowing it a priori. We present a meta-algorithm that successively rejects increasingly complex models using a simple statistical test. Given at least one candidate that satisfies realizability, we prove the meta-algorithm adapts to the optimal complexity with regret that is only marginally suboptimal in the number of episodes and number of candidate algorithms. The dimension and horizon dependencies remain optimal with respect to the best candidate, and our meta-algorithmic approach is flexible to incorporate multiple candidate algorithms and models. Finally, we show that the meta-algorithm automatically admits significantly improved instance-dependent regret bounds that depend on the gaps between the maximal values attainable by the candidates.

APA

Lee, J., Pacchiano, A., Muthukumar, V., Kong, W. & Brunskill, E.. (2021).  Online Model Selection for Reinforcement Learning with Function Approximation . Proceedings of The 24th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 130:3340-3348 Available from https://proceedings.mlr.press/v130/lee21d.html.

Online Model Selection for Reinforcement Learning with Function Approximation

Abstract

Cite this Paper

Related Material