Certified Neural Network Watermarks with Randomized Smoothing

Arpit Bansal, Ping-Yeh Chiang, Michael J Curry, Rajiv Jain, Curtis Wigington, Varun Manjunatha, John P Dickerson, Tom Goldstein
Proceedings of the 39th International Conference on Machine Learning, PMLR 162:1450-1465, 2022.

Abstract

Watermarking is a commonly used strategy to protect creators’ rights to digital images, videos and audio. Recently, watermarking methods have been extended to deep learning models – in principle, the watermark should be preserved when an adversary tries to copy the model. However, in practice, watermarks can often be removed by an intelligent adversary. Several papers have proposed watermarking methods that claim to be empirically resistant to different types of removal attacks, but these new techniques often fail in the face of new or better-tuned adversaries. In this paper, we propose the first certifiable watermarking method. Using the randomized smoothing technique, we show that our watermark is guaranteed to be unremovable unless the model parameters are changed by more than a certain $\ell_2$ threshold. In addition to being certifiable, our watermark is also empirically more robust compared to previous watermarking methods.

Cite this Paper


BibTeX
@InProceedings{pmlr-v162-bansal22a, title = {Certified Neural Network Watermarks with Randomized Smoothing}, author = {Bansal, Arpit and Chiang, Ping-Yeh and Curry, Michael J and Jain, Rajiv and Wigington, Curtis and Manjunatha, Varun and Dickerson, John P and Goldstein, Tom}, booktitle = {Proceedings of the 39th International Conference on Machine Learning}, pages = {1450--1465}, year = {2022}, editor = {Chaudhuri, Kamalika and Jegelka, Stefanie and Song, Le and Szepesvari, Csaba and Niu, Gang and Sabato, Sivan}, volume = {162}, series = {Proceedings of Machine Learning Research}, month = {17--23 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v162/bansal22a/bansal22a.pdf}, url = {https://proceedings.mlr.press/v162/bansal22a.html}, abstract = {Watermarking is a commonly used strategy to protect creators’ rights to digital images, videos and audio. Recently, watermarking methods have been extended to deep learning models – in principle, the watermark should be preserved when an adversary tries to copy the model. However, in practice, watermarks can often be removed by an intelligent adversary. Several papers have proposed watermarking methods that claim to be empirically resistant to different types of removal attacks, but these new techniques often fail in the face of new or better-tuned adversaries. In this paper, we propose the first certifiable watermarking method. Using the randomized smoothing technique, we show that our watermark is guaranteed to be unremovable unless the model parameters are changed by more than a certain $\ell_2$ threshold. In addition to being certifiable, our watermark is also empirically more robust compared to previous watermarking methods.} }
Endnote
%0 Conference Paper %T Certified Neural Network Watermarks with Randomized Smoothing %A Arpit Bansal %A Ping-Yeh Chiang %A Michael J Curry %A Rajiv Jain %A Curtis Wigington %A Varun Manjunatha %A John P Dickerson %A Tom Goldstein %B Proceedings of the 39th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2022 %E Kamalika Chaudhuri %E Stefanie Jegelka %E Le Song %E Csaba Szepesvari %E Gang Niu %E Sivan Sabato %F pmlr-v162-bansal22a %I PMLR %P 1450--1465 %U https://proceedings.mlr.press/v162/bansal22a.html %V 162 %X Watermarking is a commonly used strategy to protect creators’ rights to digital images, videos and audio. Recently, watermarking methods have been extended to deep learning models – in principle, the watermark should be preserved when an adversary tries to copy the model. However, in practice, watermarks can often be removed by an intelligent adversary. Several papers have proposed watermarking methods that claim to be empirically resistant to different types of removal attacks, but these new techniques often fail in the face of new or better-tuned adversaries. In this paper, we propose the first certifiable watermarking method. Using the randomized smoothing technique, we show that our watermark is guaranteed to be unremovable unless the model parameters are changed by more than a certain $\ell_2$ threshold. In addition to being certifiable, our watermark is also empirically more robust compared to previous watermarking methods.
APA
Bansal, A., Chiang, P., Curry, M.J., Jain, R., Wigington, C., Manjunatha, V., Dickerson, J.P. & Goldstein, T.. (2022). Certified Neural Network Watermarks with Randomized Smoothing. Proceedings of the 39th International Conference on Machine Learning, in Proceedings of Machine Learning Research 162:1450-1465 Available from https://proceedings.mlr.press/v162/bansal22a.html.

Related Material