On the Difficulty of Defending Self-Supervised Learning against Model Extraction

Adam Dziedzic, Nikita Dhawan, Muhammad Ahmad Kaleem, Jonas Guan, Nicolas Papernot
Proceedings of the 39th International Conference on Machine Learning, PMLR 162:5757-5776, 2022.

Abstract

Self-Supervised Learning (SSL) is an increasingly popular ML paradigm that trains models to transform complex inputs into representations without relying on explicit labels. These representations encode similarity structures that enable efficient learning of multiple downstream tasks. Recently, ML-as-a-Service providers have commenced offering trained SSL models over inference APIs, which transform user inputs into useful representations for a fee. However, the high cost involved to train these models and their exposure over APIs both make black-box extraction a realistic security threat. We thus explore model stealing attacks against SSL. Unlike traditional model extraction on classifiers that output labels, the victim models here output representations; these representations are of significantly higher dimensionality compared to the low-dimensional prediction scores output by classifiers. We construct several novel attacks and find that approaches that train directly on a victim’s stolen representations are query efficient and enable high accuracy for downstream models. We then show that existing defenses against model extraction are inadequate and not easily retrofitted to the specificities of SSL.

Cite this Paper


BibTeX
@InProceedings{pmlr-v162-dziedzic22a, title = {On the Difficulty of Defending Self-Supervised Learning against Model Extraction}, author = {Dziedzic, Adam and Dhawan, Nikita and Kaleem, Muhammad Ahmad and Guan, Jonas and Papernot, Nicolas}, booktitle = {Proceedings of the 39th International Conference on Machine Learning}, pages = {5757--5776}, year = {2022}, editor = {Chaudhuri, Kamalika and Jegelka, Stefanie and Song, Le and Szepesvari, Csaba and Niu, Gang and Sabato, Sivan}, volume = {162}, series = {Proceedings of Machine Learning Research}, month = {17--23 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v162/dziedzic22a/dziedzic22a.pdf}, url = {https://proceedings.mlr.press/v162/dziedzic22a.html}, abstract = {Self-Supervised Learning (SSL) is an increasingly popular ML paradigm that trains models to transform complex inputs into representations without relying on explicit labels. These representations encode similarity structures that enable efficient learning of multiple downstream tasks. Recently, ML-as-a-Service providers have commenced offering trained SSL models over inference APIs, which transform user inputs into useful representations for a fee. However, the high cost involved to train these models and their exposure over APIs both make black-box extraction a realistic security threat. We thus explore model stealing attacks against SSL. Unlike traditional model extraction on classifiers that output labels, the victim models here output representations; these representations are of significantly higher dimensionality compared to the low-dimensional prediction scores output by classifiers. We construct several novel attacks and find that approaches that train directly on a victim’s stolen representations are query efficient and enable high accuracy for downstream models. We then show that existing defenses against model extraction are inadequate and not easily retrofitted to the specificities of SSL.} }
Endnote
%0 Conference Paper %T On the Difficulty of Defending Self-Supervised Learning against Model Extraction %A Adam Dziedzic %A Nikita Dhawan %A Muhammad Ahmad Kaleem %A Jonas Guan %A Nicolas Papernot %B Proceedings of the 39th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2022 %E Kamalika Chaudhuri %E Stefanie Jegelka %E Le Song %E Csaba Szepesvari %E Gang Niu %E Sivan Sabato %F pmlr-v162-dziedzic22a %I PMLR %P 5757--5776 %U https://proceedings.mlr.press/v162/dziedzic22a.html %V 162 %X Self-Supervised Learning (SSL) is an increasingly popular ML paradigm that trains models to transform complex inputs into representations without relying on explicit labels. These representations encode similarity structures that enable efficient learning of multiple downstream tasks. Recently, ML-as-a-Service providers have commenced offering trained SSL models over inference APIs, which transform user inputs into useful representations for a fee. However, the high cost involved to train these models and their exposure over APIs both make black-box extraction a realistic security threat. We thus explore model stealing attacks against SSL. Unlike traditional model extraction on classifiers that output labels, the victim models here output representations; these representations are of significantly higher dimensionality compared to the low-dimensional prediction scores output by classifiers. We construct several novel attacks and find that approaches that train directly on a victim’s stolen representations are query efficient and enable high accuracy for downstream models. We then show that existing defenses against model extraction are inadequate and not easily retrofitted to the specificities of SSL.
APA
Dziedzic, A., Dhawan, N., Kaleem, M.A., Guan, J. & Papernot, N.. (2022). On the Difficulty of Defending Self-Supervised Learning against Model Extraction. Proceedings of the 39th International Conference on Machine Learning, in Proceedings of Machine Learning Research 162:5757-5776 Available from https://proceedings.mlr.press/v162/dziedzic22a.html.

Related Material