Model-Free Robust $φ$-Divergence Reinforcement Learning Using Both Offline and Online Data

Kishan Panaganti, Adam Wierman, Eric Mazumdar
Proceedings of the 41st International Conference on Machine Learning, PMLR 235:39324-39363, 2024.

Abstract

The robust $\phi$-regularized Markov Decision Process (RRMDP) framework focuses on designing control policies that are robust against parameter uncertainties due to mismatches between the simulator (nominal) model and real-world settings. This work makes two important contributions. First, we propose a model-free algorithm called Robust $\phi$-regularized fitted Q-iteration for learning an $\epsilon$-optimal robust policy that uses only the historical data collected by rolling out a behavior policy (with robust exploratory requirement) on the nominal model. To the best of our knowledge, we provide the first unified analysis for a class of $\phi$-divergences achieving robust optimal policies in high-dimensional systems of arbitrary large state space with general function approximation. Second, we introduce the hybrid robust $\phi$-regularized reinforcement learning framework to learn an optimal robust policy using both historical data and online sampling. Towards this framework, we propose a model-free algorithm called Hybrid robust Total-variation-regularized Q-iteration. To the best of our knowledge, we provide the first improved out-of-data-distribution assumption in large-scale problems of arbitrary large state space with general function approximation under the hybrid robust $\phi$-regularized reinforcement learning framework.

Cite this Paper


BibTeX
@InProceedings{pmlr-v235-panaganti24a, title = {Model-Free Robust $φ$-Divergence Reinforcement Learning Using Both Offline and Online Data}, author = {Panaganti, Kishan and Wierman, Adam and Mazumdar, Eric}, booktitle = {Proceedings of the 41st International Conference on Machine Learning}, pages = {39324--39363}, year = {2024}, editor = {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix}, volume = {235}, series = {Proceedings of Machine Learning Research}, month = {21--27 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v235/main/assets/panaganti24a/panaganti24a.pdf}, url = {https://proceedings.mlr.press/v235/panaganti24a.html}, abstract = {The robust $\phi$-regularized Markov Decision Process (RRMDP) framework focuses on designing control policies that are robust against parameter uncertainties due to mismatches between the simulator (nominal) model and real-world settings. This work makes two important contributions. First, we propose a model-free algorithm called Robust $\phi$-regularized fitted Q-iteration for learning an $\epsilon$-optimal robust policy that uses only the historical data collected by rolling out a behavior policy (with robust exploratory requirement) on the nominal model. To the best of our knowledge, we provide the first unified analysis for a class of $\phi$-divergences achieving robust optimal policies in high-dimensional systems of arbitrary large state space with general function approximation. Second, we introduce the hybrid robust $\phi$-regularized reinforcement learning framework to learn an optimal robust policy using both historical data and online sampling. Towards this framework, we propose a model-free algorithm called Hybrid robust Total-variation-regularized Q-iteration. To the best of our knowledge, we provide the first improved out-of-data-distribution assumption in large-scale problems of arbitrary large state space with general function approximation under the hybrid robust $\phi$-regularized reinforcement learning framework.} }
Endnote
%0 Conference Paper %T Model-Free Robust $φ$-Divergence Reinforcement Learning Using Both Offline and Online Data %A Kishan Panaganti %A Adam Wierman %A Eric Mazumdar %B Proceedings of the 41st International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2024 %E Ruslan Salakhutdinov %E Zico Kolter %E Katherine Heller %E Adrian Weller %E Nuria Oliver %E Jonathan Scarlett %E Felix Berkenkamp %F pmlr-v235-panaganti24a %I PMLR %P 39324--39363 %U https://proceedings.mlr.press/v235/panaganti24a.html %V 235 %X The robust $\phi$-regularized Markov Decision Process (RRMDP) framework focuses on designing control policies that are robust against parameter uncertainties due to mismatches between the simulator (nominal) model and real-world settings. This work makes two important contributions. First, we propose a model-free algorithm called Robust $\phi$-regularized fitted Q-iteration for learning an $\epsilon$-optimal robust policy that uses only the historical data collected by rolling out a behavior policy (with robust exploratory requirement) on the nominal model. To the best of our knowledge, we provide the first unified analysis for a class of $\phi$-divergences achieving robust optimal policies in high-dimensional systems of arbitrary large state space with general function approximation. Second, we introduce the hybrid robust $\phi$-regularized reinforcement learning framework to learn an optimal robust policy using both historical data and online sampling. Towards this framework, we propose a model-free algorithm called Hybrid robust Total-variation-regularized Q-iteration. To the best of our knowledge, we provide the first improved out-of-data-distribution assumption in large-scale problems of arbitrary large state space with general function approximation under the hybrid robust $\phi$-regularized reinforcement learning framework.
APA
Panaganti, K., Wierman, A. & Mazumdar, E.. (2024). Model-Free Robust $φ$-Divergence Reinforcement Learning Using Both Offline and Online Data. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:39324-39363 Available from https://proceedings.mlr.press/v235/panaganti24a.html.

Related Material