Model-Free Robust Average-Reward Reinforcement Learning

Yue Wang; Alvaro Velasquez; George K. Atia; Ashley Prater-Bennette; Shaofeng Zou

Model-Free Robust Average-Reward Reinforcement Learning

Yue Wang, Alvaro Velasquez, George K. Atia, Ashley Prater-Bennette, Shaofeng Zou

Proceedings of the 40th International Conference on Machine Learning, PMLR 202:36431-36469, 2023.

Abstract

Robust Markov decision processes (MDPs) address the challenge of model uncertainty by optimizing the worst-case performance over an uncertainty set of MDPs. In this paper, we focus on the robust average-reward MDPs under the model-free setting. We first theoretically characterize the structure of solutions to the robust average-reward Bellman equation, which is essential for our later convergence analysis. We then design two model-free algorithms, robust relative value iteration (RVI) TD and robust RVI Q-learning, and theoretically prove their convergence to the optimal solution. We provide several widely used uncertainty sets as examples, including those defined by the contamination model, total variation, Chi-squared divergence, Kullback-Leibler (KL) divergence, and Wasserstein distance.

Cite this Paper

BibTeX

@InProceedings{pmlr-v202-wang23am,
  title = 	 {Model-Free Robust Average-Reward Reinforcement Learning},
  author =       {Wang, Yue and Velasquez, Alvaro and Atia, George K. and Prater-Bennette, Ashley and Zou, Shaofeng},
  booktitle = 	 {Proceedings of the 40th International Conference on Machine Learning},
  pages = 	 {36431--36469},
  year = 	 {2023},
  editor = 	 {Krause, Andreas and Brunskill, Emma and Cho, Kyunghyun and Engelhardt, Barbara and Sabato, Sivan and Scarlett, Jonathan},
  volume = 	 {202},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {23--29 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v202/wang23am/wang23am.pdf},
  url = 	 {https://proceedings.mlr.press/v202/wang23am.html},
  abstract = 	 {Robust Markov decision processes (MDPs) address the challenge of model uncertainty by optimizing the worst-case performance over an uncertainty set of MDPs. In this paper, we focus on the robust average-reward MDPs under the model-free setting. We first theoretically characterize the structure of solutions to the robust average-reward Bellman equation, which is essential for our later convergence analysis. We then design two model-free algorithms, robust relative value iteration (RVI) TD and robust RVI Q-learning, and theoretically prove their convergence to the optimal solution. We provide several widely used uncertainty sets as examples, including those defined by the contamination model, total variation, Chi-squared divergence, Kullback-Leibler (KL) divergence, and Wasserstein distance.}
}

Endnote

%0 Conference Paper
%T Model-Free Robust Average-Reward Reinforcement Learning
%A Yue Wang
%A Alvaro Velasquez
%A George K. Atia
%A Ashley Prater-Bennette
%A Shaofeng Zou
%B Proceedings of the 40th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2023
%E Andreas Krause
%E Emma Brunskill
%E Kyunghyun Cho
%E Barbara Engelhardt
%E Sivan Sabato
%E Jonathan Scarlett	
%F pmlr-v202-wang23am
%I PMLR
%P 36431--36469
%U https://proceedings.mlr.press/v202/wang23am.html
%V 202
%X Robust Markov decision processes (MDPs) address the challenge of model uncertainty by optimizing the worst-case performance over an uncertainty set of MDPs. In this paper, we focus on the robust average-reward MDPs under the model-free setting. We first theoretically characterize the structure of solutions to the robust average-reward Bellman equation, which is essential for our later convergence analysis. We then design two model-free algorithms, robust relative value iteration (RVI) TD and robust RVI Q-learning, and theoretically prove their convergence to the optimal solution. We provide several widely used uncertainty sets as examples, including those defined by the contamination model, total variation, Chi-squared divergence, Kullback-Leibler (KL) divergence, and Wasserstein distance.

APA

Wang, Y., Velasquez, A., Atia, G.K., Prater-Bennette, A. & Zou, S.. (2023). Model-Free Robust Average-Reward Reinforcement Learning. Proceedings of the 40th International Conference on Machine Learning, in Proceedings of Machine Learning Research 202:36431-36469 Available from https://proceedings.mlr.press/v202/wang23am.html.

Model-Free Robust Average-Reward Reinforcement Learning

Abstract

Cite this Paper

Related Material