Model-Free Robust $φ$-Divergence Reinforcement Learning Using Both Offline and Online Data

Kishan Panaganti; Adam Wierman; Eric Mazumdar

Model-Free Robust $φ$ -Divergence Reinforcement Learning Using Both Offline and Online Data

Kishan Panaganti, Adam Wierman, Eric Mazumdar

Proceedings of the 41st International Conference on Machine Learning, PMLR 235:39324-39363, 2024.

Abstract

The robust

$\phi$ -regularized Markov Decision Process (RRMDP) framework focuses on designing control policies that are robust against parameter uncertainties due to mismatches between the simulator (nominal) model and real-world settings. This work makes two important contributions. First, we propose a model-free algorithm called Robust $\phi$ -regularized fitted Q-iteration for learning an

$\epsilon$ -optimal robust policy that uses only the historical data collected by rolling out a behavior policy (with robust exploratory requirement) on the nominal model. To the best of our knowledge, we provide the first unified analysis for a class of

$\phi$ -divergences achieving robust optimal policies in high-dimensional systems of arbitrary large state space with general function approximation. Second, we introduce the hybrid robust $\phi$ -regularized reinforcement learning framework to learn an optimal robust policy using both historical data and online sampling. Towards this framework, we propose a model-free algorithm called Hybrid robust Total-variation-regularized Q-iteration. To the best of our knowledge, we provide the first improved out-of-data-distribution assumption in large-scale problems of arbitrary large state space with general function approximation under the hybrid robust

$\phi$ -regularized reinforcement learning framework.

Cite this Paper

BibTeX


@InProceedings{pmlr-v235-panaganti24a,
  title = 	 {Model-Free Robust $φ$-Divergence Reinforcement Learning Using Both Offline and Online Data},
  author =       {Panaganti, Kishan and Wierman, Adam and Mazumdar, Eric},
  booktitle = 	 {Proceedings of the 41st International Conference on Machine Learning},
  pages = 	 {39324--39363},
  year = 	 {2024},
  editor = 	 {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix},
  volume = 	 {235},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {21--27 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v235/main/assets/panaganti24a/panaganti24a.pdf},
  url = 	 {https://proceedings.mlr.press/v235/panaganti24a.html},
  abstract = 	 {The robust $\phi$-regularized Markov Decision Process (RRMDP) framework focuses on designing control policies that are robust against parameter uncertainties due to mismatches between the simulator (nominal) model and real-world settings. This work makes two important contributions. First, we propose a model-free algorithm called Robust $\phi$-regularized fitted Q-iteration for learning an $\epsilon$-optimal robust policy that uses only the historical data collected by rolling out a behavior policy (with robust exploratory requirement) on the nominal model. To the best of our knowledge, we provide the first unified analysis for a class of $\phi$-divergences achieving robust optimal policies in high-dimensional systems of arbitrary large state space with general function approximation. Second, we introduce the hybrid robust $\phi$-regularized reinforcement learning framework to learn an optimal robust policy using both historical data and online sampling. Towards this framework, we propose a model-free algorithm called Hybrid robust Total-variation-regularized Q-iteration. To the best of our knowledge, we provide the first improved out-of-data-distribution assumption in large-scale problems of arbitrary large state space with general function approximation under the hybrid robust $\phi$-regularized reinforcement learning framework.}
}

Endnote

%0 Conference Paper
%T Model-Free Robust $φ$-Divergence Reinforcement Learning Using Both Offline and Online Data
%A Kishan Panaganti
%A Adam Wierman
%A Eric Mazumdar
%B Proceedings of the 41st International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2024
%E Ruslan Salakhutdinov
%E Zico Kolter
%E Katherine Heller
%E Adrian Weller
%E Nuria Oliver
%E Jonathan Scarlett
%E Felix Berkenkamp	
%F pmlr-v235-panaganti24a
%I PMLR
%P 39324--39363
%U https://proceedings.mlr.press/v235/panaganti24a.html
%V 235
%X The robust $\phi$-regularized Markov Decision Process (RRMDP) framework focuses on designing control policies that are robust against parameter uncertainties due to mismatches between the simulator (nominal) model and real-world settings. This work makes two important contributions. First, we propose a model-free algorithm called Robust $\phi$-regularized fitted Q-iteration for learning an $\epsilon$-optimal robust policy that uses only the historical data collected by rolling out a behavior policy (with robust exploratory requirement) on the nominal model. To the best of our knowledge, we provide the first unified analysis for a class of $\phi$-divergences achieving robust optimal policies in high-dimensional systems of arbitrary large state space with general function approximation. Second, we introduce the hybrid robust $\phi$-regularized reinforcement learning framework to learn an optimal robust policy using both historical data and online sampling. Towards this framework, we propose a model-free algorithm called Hybrid robust Total-variation-regularized Q-iteration. To the best of our knowledge, we provide the first improved out-of-data-distribution assumption in large-scale problems of arbitrary large state space with general function approximation under the hybrid robust $\phi$-regularized reinforcement learning framework.

APA


Panaganti, K., Wierman, A. & Mazumdar, E.. (2024). Model-Free Robust $φ$-Divergence Reinforcement Learning Using Both Offline and Online Data. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:39324-39363 Available from https://proceedings.mlr.press/v235/panaganti24a.html.

Model-Free Robust φφ-Divergence Reinforcement Learning Using Both Offline and Online Data

Abstract

Cite this Paper

Related Material

Model-Free Robust $φ$ -Divergence Reinforcement Learning Using Both Offline and Online Data