Stealthy Imitation: Reward-guided Environment-free Policy Stealing

Zhixiong Zhuang, Maria-Irina Nicolae, Mario Fritz
Proceedings of the 41st International Conference on Machine Learning, PMLR 235:62682-62706, 2024.

Abstract

Deep reinforcement learning policies, which are integral to modern control systems, represent valuable intellectual property. The development of these policies demands considerable resources, such as domain expertise, simulation fidelity, and real-world validation. These policies are potentially vulnerable to model stealing attacks, which aim to replicate their functionality using only black-box access. In this paper, we propose Stealthy Imitation, the first attack designed to steal policies without access to the environment or knowledge of the input range. This setup has not been considered by previous model stealing methods. Lacking access to the victim’s input states distribution, Stealthy Imitation fits a reward model that allows to approximate it. We show that the victim policy is harder to imitate when the distribution of the attack queries matches that of the victim. We evaluate our approach across diverse, high-dimensional control tasks and consistently outperform prior data-free approaches adapted for policy stealing. Lastly, we propose a countermeasure that significantly diminishes the effectiveness of the attack.

Cite this Paper


BibTeX
@InProceedings{pmlr-v235-zhuang24a, title = {Stealthy Imitation: Reward-guided Environment-free Policy Stealing}, author = {Zhuang, Zhixiong and Nicolae, Maria-Irina and Fritz, Mario}, booktitle = {Proceedings of the 41st International Conference on Machine Learning}, pages = {62682--62706}, year = {2024}, editor = {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix}, volume = {235}, series = {Proceedings of Machine Learning Research}, month = {21--27 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v235/main/assets/zhuang24a/zhuang24a.pdf}, url = {https://proceedings.mlr.press/v235/zhuang24a.html}, abstract = {Deep reinforcement learning policies, which are integral to modern control systems, represent valuable intellectual property. The development of these policies demands considerable resources, such as domain expertise, simulation fidelity, and real-world validation. These policies are potentially vulnerable to model stealing attacks, which aim to replicate their functionality using only black-box access. In this paper, we propose Stealthy Imitation, the first attack designed to steal policies without access to the environment or knowledge of the input range. This setup has not been considered by previous model stealing methods. Lacking access to the victim’s input states distribution, Stealthy Imitation fits a reward model that allows to approximate it. We show that the victim policy is harder to imitate when the distribution of the attack queries matches that of the victim. We evaluate our approach across diverse, high-dimensional control tasks and consistently outperform prior data-free approaches adapted for policy stealing. Lastly, we propose a countermeasure that significantly diminishes the effectiveness of the attack.} }
Endnote
%0 Conference Paper %T Stealthy Imitation: Reward-guided Environment-free Policy Stealing %A Zhixiong Zhuang %A Maria-Irina Nicolae %A Mario Fritz %B Proceedings of the 41st International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2024 %E Ruslan Salakhutdinov %E Zico Kolter %E Katherine Heller %E Adrian Weller %E Nuria Oliver %E Jonathan Scarlett %E Felix Berkenkamp %F pmlr-v235-zhuang24a %I PMLR %P 62682--62706 %U https://proceedings.mlr.press/v235/zhuang24a.html %V 235 %X Deep reinforcement learning policies, which are integral to modern control systems, represent valuable intellectual property. The development of these policies demands considerable resources, such as domain expertise, simulation fidelity, and real-world validation. These policies are potentially vulnerable to model stealing attacks, which aim to replicate their functionality using only black-box access. In this paper, we propose Stealthy Imitation, the first attack designed to steal policies without access to the environment or knowledge of the input range. This setup has not been considered by previous model stealing methods. Lacking access to the victim’s input states distribution, Stealthy Imitation fits a reward model that allows to approximate it. We show that the victim policy is harder to imitate when the distribution of the attack queries matches that of the victim. We evaluate our approach across diverse, high-dimensional control tasks and consistently outperform prior data-free approaches adapted for policy stealing. Lastly, we propose a countermeasure that significantly diminishes the effectiveness of the attack.
APA
Zhuang, Z., Nicolae, M. & Fritz, M.. (2024). Stealthy Imitation: Reward-guided Environment-free Policy Stealing. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:62682-62706 Available from https://proceedings.mlr.press/v235/zhuang24a.html.

Related Material