[edit]
Offline Reward Perturbation Boosts Distributional Shift in Online RL
Proceedings of the Fortieth Conference on Uncertainty in Artificial Intelligence, PMLR 244:4041-4055, 2024.
Abstract
Offline-to-online reinforcement learning has recently been shown effective in reducing the online sample complexity by first training from offline collected data. However, this additional data source may also invite new poisoning attacks that target offline training. In this work, we reveal such vulnerabilities in critic-regularized offline RL by proposing a novel data poisoning attack method, which is stealthy in the sense that the performance during the offline training remains intact, but the online fine-tuning stage will suffer a significant performance drop. Our method leverages the techniques from bi-level optimization to promote the over-estimation/distribution shift under offline-to-online reinforcement learning. Experiments on four environments confirm the satisfaction of the new stealthiness requirement, and can be effective in attacking with only a small budget and without having white-box access to the victim model.