Non-stationary Reinforcement Learning under General Function Approximation

Songtao Feng; Ming Yin; Ruiquan Huang; Yu-Xiang Wang; Jing Yang; Yingbin Liang

Non-stationary Reinforcement Learning under General Function Approximation

Songtao Feng, Ming Yin, Ruiquan Huang, Yu-Xiang Wang, Jing Yang, Yingbin Liang

Proceedings of the 40th International Conference on Machine Learning, PMLR 202:9976-10007, 2023.

Abstract

General function approximation is a powerful tool to handle large state and action spaces in a broad range of reinforcement learning (RL) scenarios. However, theoretical understanding of non-stationary MDPs with general function approximation is still limited. In this paper, we make the first such an attempt. We first propose a new complexity metric called dynamic Bellman Eluder (DBE) dimension for non-stationary MDPs, which subsumes majority of existing tractable RL problems in static MDPs as well as non-stationary MDPs. Based on the proposed complexity metric, we propose a novel confidence-set based model-free algorithm called SW-OPEA, which features a sliding window mechanism and a new confidence set design for non-stationary MDPs. We then establish an upper bound on the dynamic regret for the proposed algorithm, and show that SW-OPEA is provably efficient as long as the variation budget is not significantly large. We further demonstrate via examples of non-stationary linear and tabular MDPs that our algorithm performs better in small variation budget scenario than the existing UCB-type algorithms. To the best of our knowledge, this is the first dynamic regret analysis in non-stationary MDPs with general function approximation.

Cite this Paper

BibTeX

@InProceedings{pmlr-v202-feng23e,
  title = 	 {Non-stationary Reinforcement Learning under General Function Approximation},
  author =       {Feng, Songtao and Yin, Ming and Huang, Ruiquan and Wang, Yu-Xiang and Yang, Jing and Liang, Yingbin},
  booktitle = 	 {Proceedings of the 40th International Conference on Machine Learning},
  pages = 	 {9976--10007},
  year = 	 {2023},
  editor = 	 {Krause, Andreas and Brunskill, Emma and Cho, Kyunghyun and Engelhardt, Barbara and Sabato, Sivan and Scarlett, Jonathan},
  volume = 	 {202},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {23--29 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v202/feng23e/feng23e.pdf},
  url = 	 {https://proceedings.mlr.press/v202/feng23e.html},
  abstract = 	 {General function approximation is a powerful tool to handle large state and action spaces in a broad range of reinforcement learning (RL) scenarios. However, theoretical understanding of non-stationary MDPs with general function approximation is still limited. In this paper, we make the first such an attempt. We first propose a new complexity metric called dynamic Bellman Eluder (DBE) dimension for non-stationary MDPs, which subsumes majority of existing tractable RL problems in static MDPs as well as non-stationary MDPs. Based on the proposed complexity metric, we propose a novel confidence-set based model-free algorithm called SW-OPEA, which features a sliding window mechanism and a new confidence set design for non-stationary MDPs. We then establish an upper bound on the dynamic regret for the proposed algorithm, and show that SW-OPEA is provably efficient as long as the variation budget is not significantly large. We further demonstrate via examples of non-stationary linear and tabular MDPs that our algorithm performs better in small variation budget scenario than the existing UCB-type algorithms. To the best of our knowledge, this is the first dynamic regret analysis in non-stationary MDPs with general function approximation.}
}

Endnote

%0 Conference Paper
%T Non-stationary Reinforcement Learning under General Function Approximation
%A Songtao Feng
%A Ming Yin
%A Ruiquan Huang
%A Yu-Xiang Wang
%A Jing Yang
%A Yingbin Liang
%B Proceedings of the 40th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2023
%E Andreas Krause
%E Emma Brunskill
%E Kyunghyun Cho
%E Barbara Engelhardt
%E Sivan Sabato
%E Jonathan Scarlett	
%F pmlr-v202-feng23e
%I PMLR
%P 9976--10007
%U https://proceedings.mlr.press/v202/feng23e.html
%V 202
%X General function approximation is a powerful tool to handle large state and action spaces in a broad range of reinforcement learning (RL) scenarios. However, theoretical understanding of non-stationary MDPs with general function approximation is still limited. In this paper, we make the first such an attempt. We first propose a new complexity metric called dynamic Bellman Eluder (DBE) dimension for non-stationary MDPs, which subsumes majority of existing tractable RL problems in static MDPs as well as non-stationary MDPs. Based on the proposed complexity metric, we propose a novel confidence-set based model-free algorithm called SW-OPEA, which features a sliding window mechanism and a new confidence set design for non-stationary MDPs. We then establish an upper bound on the dynamic regret for the proposed algorithm, and show that SW-OPEA is provably efficient as long as the variation budget is not significantly large. We further demonstrate via examples of non-stationary linear and tabular MDPs that our algorithm performs better in small variation budget scenario than the existing UCB-type algorithms. To the best of our knowledge, this is the first dynamic regret analysis in non-stationary MDPs with general function approximation.

APA

Feng, S., Yin, M., Huang, R., Wang, Y., Yang, J. & Liang, Y.. (2023). Non-stationary Reinforcement Learning under General Function Approximation. Proceedings of the 40th International Conference on Machine Learning, in Proceedings of Machine Learning Research 202:9976-10007 Available from https://proceedings.mlr.press/v202/feng23e.html.

Non-stationary Reinforcement Learning under General Function Approximation

Abstract

Cite this Paper

Related Material