Defense against Model Extraction Attack by Bayesian Active Watermarking

Zhenyi Wang; Yihan Wu; Heng Huang

Defense against Model Extraction Attack by Bayesian Active Watermarking

Zhenyi Wang, Yihan Wu, Heng Huang

Proceedings of the 41st International Conference on Machine Learning, PMLR 235:51913-51935, 2024.

Abstract

Model extraction is to obtain a cloned model that replicates the functionality of a black-box victim model solely through query-based access. Present defense strategies exhibit shortcomings, manifesting as: (1) computational or memory inefficiencies during deployment; or (2) dependence on expensive defensive training methods that mandate the re-training of the victim model; or (3) watermarking-based methods only passively detect model theft without actively preventing model extraction. To address these limitations, we introduce an innovative Bayesian active watermarking technique to fine-tune the victim model and learn the watermark posterior distribution conditioned on input data. The fine-tuning process aims to maximize the log-likelihood on watermarked in-distribution training data for preserving model utility while simultaneously maximizing the change of model’s outputs on watermarked out-of-distribution data, thereby achieving effective defense. During deployment, a watermark is randomly sampled from the estimated watermark posterior. This watermark is then added to the input query, and the victim model returns the prediction based on the watermarked input query to users. This proactive defense approach requires only slight fine-tuning of the victim model without the need of full re-training and demonstrates high efficiency in terms of memory and computation during deployment. Rigorous theoretical analysis and comprehensive experimental results demonstrate the efficacy of our proposed method.

Cite this Paper

BibTeX


@InProceedings{pmlr-v235-wang24cb,
  title = 	 {Defense against Model Extraction Attack by {B}ayesian Active Watermarking},
  author =       {Wang, Zhenyi and Wu, Yihan and Huang, Heng},
  booktitle = 	 {Proceedings of the 41st International Conference on Machine Learning},
  pages = 	 {51913--51935},
  year = 	 {2024},
  editor = 	 {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix},
  volume = 	 {235},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {21--27 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v235/main/assets/wang24cb/wang24cb.pdf},
  url = 	 {https://proceedings.mlr.press/v235/wang24cb.html},
  abstract = 	 {Model extraction is to obtain a cloned model that replicates the functionality of a black-box victim model solely through query-based access. Present defense strategies exhibit shortcomings, manifesting as: (1) computational or memory inefficiencies during deployment; or (2) dependence on expensive defensive training methods that mandate the re-training of the victim model; or (3) watermarking-based methods only passively detect model theft without actively preventing model extraction. To address these limitations, we introduce an innovative Bayesian active watermarking technique to fine-tune the victim model and learn the watermark posterior distribution conditioned on input data. The fine-tuning process aims to maximize the log-likelihood on watermarked in-distribution training data for preserving model utility while simultaneously maximizing the change of model’s outputs on watermarked out-of-distribution data, thereby achieving effective defense. During deployment, a watermark is randomly sampled from the estimated watermark posterior. This watermark is then added to the input query, and the victim model returns the prediction based on the watermarked input query to users. This proactive defense approach requires only slight fine-tuning of the victim model without the need of full re-training and demonstrates high efficiency in terms of memory and computation during deployment. Rigorous theoretical analysis and comprehensive experimental results demonstrate the efficacy of our proposed method.}
}

Endnote

%0 Conference Paper
%T Defense against Model Extraction Attack by Bayesian Active Watermarking
%A Zhenyi Wang
%A Yihan Wu
%A Heng Huang
%B Proceedings of the 41st International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2024
%E Ruslan Salakhutdinov
%E Zico Kolter
%E Katherine Heller
%E Adrian Weller
%E Nuria Oliver
%E Jonathan Scarlett
%E Felix Berkenkamp	
%F pmlr-v235-wang24cb
%I PMLR
%P 51913--51935
%U https://proceedings.mlr.press/v235/wang24cb.html
%V 235
%X Model extraction is to obtain a cloned model that replicates the functionality of a black-box victim model solely through query-based access. Present defense strategies exhibit shortcomings, manifesting as: (1) computational or memory inefficiencies during deployment; or (2) dependence on expensive defensive training methods that mandate the re-training of the victim model; or (3) watermarking-based methods only passively detect model theft without actively preventing model extraction. To address these limitations, we introduce an innovative Bayesian active watermarking technique to fine-tune the victim model and learn the watermark posterior distribution conditioned on input data. The fine-tuning process aims to maximize the log-likelihood on watermarked in-distribution training data for preserving model utility while simultaneously maximizing the change of model’s outputs on watermarked out-of-distribution data, thereby achieving effective defense. During deployment, a watermark is randomly sampled from the estimated watermark posterior. This watermark is then added to the input query, and the victim model returns the prediction based on the watermarked input query to users. This proactive defense approach requires only slight fine-tuning of the victim model without the need of full re-training and demonstrates high efficiency in terms of memory and computation during deployment. Rigorous theoretical analysis and comprehensive experimental results demonstrate the efficacy of our proposed method.

APA


Wang, Z., Wu, Y. & Huang, H.. (2024). Defense against Model Extraction Attack by Bayesian Active Watermarking. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:51913-51935 Available from https://proceedings.mlr.press/v235/wang24cb.html.

Defense against Model Extraction Attack by Bayesian Active Watermarking

Abstract

Cite this Paper

Related Material