Collapsing Sequence-Level Data-Policy Coverage via Poisoning Attack in Offline Reinforcement Learning

Xue Zhou, Dapeng Man, Chen Xu, Fanyi Zeng, Tao Liu, Huan Wang, Shucheng He, Chaoyang Gao, Wu Yang
Proceedings of the Forty-first Conference on Uncertainty in Artificial Intelligence, PMLR 286:5084-5098, 2025.

Abstract

Offline reinforcement learning (RL) heavily relies on the coverage of pre-collected data over the target policy’s distribution. Existing studies aim to improve data-policy coverage to mitigate distributional shifts, but overlook security risks from insufficient coverage, and the single-step analysis is not consistent with the multi-step decision-making nature of offline RL. To address this, we introduce the sequence-level concentrability coefficient to quantify coverage, and reveal its exponential amplification on the upper bound of estimation errors through theoretical analysis. Building on this, we propose the Collapsing Sequence-Level Data-Policy Coverage (CSDPC) poisoning attack. Considering the continuous nature of offline RL data, we convert state-action pairs into decision units, and extract representative decision patterns that capture multi-step behavior. We identify rare patterns likely to cause insufficient coverage, and poison them to reduce coverage and exacerbate distributional shifts. Experiments show that poisoning just 1% of the dataset can degrade agent performance by 90%. This finding provides new perspectives for analyzing and safeguarding the security of offline RL.

Cite this Paper


BibTeX
@InProceedings{pmlr-v286-zhou25a, title = {Collapsing Sequence-Level Data-Policy Coverage via Poisoning Attack in Offline Reinforcement Learning}, author = {Zhou, Xue and Man, Dapeng and Xu, Chen and Zeng, Fanyi and Liu, Tao and Wang, Huan and He, Shucheng and Gao, Chaoyang and Yang, Wu}, booktitle = {Proceedings of the Forty-first Conference on Uncertainty in Artificial Intelligence}, pages = {5084--5098}, year = {2025}, editor = {Chiappa, Silvia and Magliacane, Sara}, volume = {286}, series = {Proceedings of Machine Learning Research}, month = {21--25 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v286/main/assets/zhou25a/zhou25a.pdf}, url = {https://proceedings.mlr.press/v286/zhou25a.html}, abstract = {Offline reinforcement learning (RL) heavily relies on the coverage of pre-collected data over the target policy’s distribution. Existing studies aim to improve data-policy coverage to mitigate distributional shifts, but overlook security risks from insufficient coverage, and the single-step analysis is not consistent with the multi-step decision-making nature of offline RL. To address this, we introduce the sequence-level concentrability coefficient to quantify coverage, and reveal its exponential amplification on the upper bound of estimation errors through theoretical analysis. Building on this, we propose the Collapsing Sequence-Level Data-Policy Coverage (CSDPC) poisoning attack. Considering the continuous nature of offline RL data, we convert state-action pairs into decision units, and extract representative decision patterns that capture multi-step behavior. We identify rare patterns likely to cause insufficient coverage, and poison them to reduce coverage and exacerbate distributional shifts. Experiments show that poisoning just 1% of the dataset can degrade agent performance by 90%. This finding provides new perspectives for analyzing and safeguarding the security of offline RL.} }
Endnote
%0 Conference Paper %T Collapsing Sequence-Level Data-Policy Coverage via Poisoning Attack in Offline Reinforcement Learning %A Xue Zhou %A Dapeng Man %A Chen Xu %A Fanyi Zeng %A Tao Liu %A Huan Wang %A Shucheng He %A Chaoyang Gao %A Wu Yang %B Proceedings of the Forty-first Conference on Uncertainty in Artificial Intelligence %C Proceedings of Machine Learning Research %D 2025 %E Silvia Chiappa %E Sara Magliacane %F pmlr-v286-zhou25a %I PMLR %P 5084--5098 %U https://proceedings.mlr.press/v286/zhou25a.html %V 286 %X Offline reinforcement learning (RL) heavily relies on the coverage of pre-collected data over the target policy’s distribution. Existing studies aim to improve data-policy coverage to mitigate distributional shifts, but overlook security risks from insufficient coverage, and the single-step analysis is not consistent with the multi-step decision-making nature of offline RL. To address this, we introduce the sequence-level concentrability coefficient to quantify coverage, and reveal its exponential amplification on the upper bound of estimation errors through theoretical analysis. Building on this, we propose the Collapsing Sequence-Level Data-Policy Coverage (CSDPC) poisoning attack. Considering the continuous nature of offline RL data, we convert state-action pairs into decision units, and extract representative decision patterns that capture multi-step behavior. We identify rare patterns likely to cause insufficient coverage, and poison them to reduce coverage and exacerbate distributional shifts. Experiments show that poisoning just 1% of the dataset can degrade agent performance by 90%. This finding provides new perspectives for analyzing and safeguarding the security of offline RL.
APA
Zhou, X., Man, D., Xu, C., Zeng, F., Liu, T., Wang, H., He, S., Gao, C. & Yang, W.. (2025). Collapsing Sequence-Level Data-Policy Coverage via Poisoning Attack in Offline Reinforcement Learning. Proceedings of the Forty-first Conference on Uncertainty in Artificial Intelligence, in Proceedings of Machine Learning Research 286:5084-5098 Available from https://proceedings.mlr.press/v286/zhou25a.html.

Related Material