Robust exploration with adversary via Langevin Monte Carlo

Hao-Lun Hsu, Miroslav Pajic
Proceedings of the 6th Annual Learning for Dynamics & Control Conference, PMLR 242:1592-1605, 2024.

Abstract

In the realm of Deep Q-Networks (DQNs), numerous exploration strategies have demonstrated efficacy within controlled environments. However, these methods encounter formidable challenges when confronted with the unpredictability of real-world scenarios marked by disturbances. The optimization of exploration efficiency under such disturbances is not fully investigated. In response to these challenges, this work introduces a versatile reinforcement learning (RL) framework that systematically addresses the intricate interplay between exploration and robustness in dynamic and unpredictable environments. We propose a robust RL methodology, framed within a two-player max-min adversarial paradigm. This formulation is cast as a Probabilistic Action Robust Markov Decision Process (MDP), grounded in a cyber-physical perspective. Our methodology capitalizes on Langevin Monte Carlo (LMC) for Q-function exploration, facilitating iterative updates that empower both the protagonist and adversary to efficaciously explore. Notably, we extend this adversarial training paradigm to encompass robustness against delayed feedback episodes. Empirical evaluation, conducted on benchmark problems such as N-Chain and deep brain stimulation, underlines the consistent superiority of our method over baseline approaches across diverse perturbation scenarios and instances of delayed feedback.

Cite this Paper


BibTeX
@InProceedings{pmlr-v242-hsu24a, title = {Robust exploration with adversary via {L}angevin {M}onte {C}arlo}, author = {Hsu, Hao-Lun and Pajic, Miroslav}, booktitle = {Proceedings of the 6th Annual Learning for Dynamics & Control Conference}, pages = {1592--1605}, year = {2024}, editor = {Abate, Alessandro and Cannon, Mark and Margellos, Kostas and Papachristodoulou, Antonis}, volume = {242}, series = {Proceedings of Machine Learning Research}, month = {15--17 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v242/hsu24a/hsu24a.pdf}, url = {https://proceedings.mlr.press/v242/hsu24a.html}, abstract = {In the realm of Deep Q-Networks (DQNs), numerous exploration strategies have demonstrated efficacy within controlled environments. However, these methods encounter formidable challenges when confronted with the unpredictability of real-world scenarios marked by disturbances. The optimization of exploration efficiency under such disturbances is not fully investigated. In response to these challenges, this work introduces a versatile reinforcement learning (RL) framework that systematically addresses the intricate interplay between exploration and robustness in dynamic and unpredictable environments. We propose a robust RL methodology, framed within a two-player max-min adversarial paradigm. This formulation is cast as a Probabilistic Action Robust Markov Decision Process (MDP), grounded in a cyber-physical perspective. Our methodology capitalizes on Langevin Monte Carlo (LMC) for Q-function exploration, facilitating iterative updates that empower both the protagonist and adversary to efficaciously explore. Notably, we extend this adversarial training paradigm to encompass robustness against delayed feedback episodes. Empirical evaluation, conducted on benchmark problems such as N-Chain and deep brain stimulation, underlines the consistent superiority of our method over baseline approaches across diverse perturbation scenarios and instances of delayed feedback.} }
Endnote
%0 Conference Paper %T Robust exploration with adversary via Langevin Monte Carlo %A Hao-Lun Hsu %A Miroslav Pajic %B Proceedings of the 6th Annual Learning for Dynamics & Control Conference %C Proceedings of Machine Learning Research %D 2024 %E Alessandro Abate %E Mark Cannon %E Kostas Margellos %E Antonis Papachristodoulou %F pmlr-v242-hsu24a %I PMLR %P 1592--1605 %U https://proceedings.mlr.press/v242/hsu24a.html %V 242 %X In the realm of Deep Q-Networks (DQNs), numerous exploration strategies have demonstrated efficacy within controlled environments. However, these methods encounter formidable challenges when confronted with the unpredictability of real-world scenarios marked by disturbances. The optimization of exploration efficiency under such disturbances is not fully investigated. In response to these challenges, this work introduces a versatile reinforcement learning (RL) framework that systematically addresses the intricate interplay between exploration and robustness in dynamic and unpredictable environments. We propose a robust RL methodology, framed within a two-player max-min adversarial paradigm. This formulation is cast as a Probabilistic Action Robust Markov Decision Process (MDP), grounded in a cyber-physical perspective. Our methodology capitalizes on Langevin Monte Carlo (LMC) for Q-function exploration, facilitating iterative updates that empower both the protagonist and adversary to efficaciously explore. Notably, we extend this adversarial training paradigm to encompass robustness against delayed feedback episodes. Empirical evaluation, conducted on benchmark problems such as N-Chain and deep brain stimulation, underlines the consistent superiority of our method over baseline approaches across diverse perturbation scenarios and instances of delayed feedback.
APA
Hsu, H. & Pajic, M.. (2024). Robust exploration with adversary via Langevin Monte Carlo. Proceedings of the 6th Annual Learning for Dynamics & Control Conference, in Proceedings of Machine Learning Research 242:1592-1605 Available from https://proceedings.mlr.press/v242/hsu24a.html.

Related Material