Black-box Adversarial Attacks with Limited Queries and Information

Andrew Ilyas, Logan Engstrom, Anish Athalye, Jessy Lin
Proceedings of the 35th International Conference on Machine Learning, PMLR 80:2137-2146, 2018.

Abstract

Current neural network-based classifiers are susceptible to adversarial examples even in the black-box setting, where the attacker only has query access to the model. In practice, the threat model for real-world systems is often more restrictive than the typical black-box model where the adversary can observe the full output of the network on arbitrarily many chosen inputs. We define three realistic threat models that more accurately characterize many real-world classifiers: the query-limited setting, the partial-information setting, and the label-only setting. We develop new attacks that fool classifiers under these more restrictive threat models, where previous methods would be impractical or ineffective. We demonstrate that our methods are effective against an ImageNet classifier under our proposed threat models. We also demonstrate a targeted black-box attack against a commercial classifier, overcoming the challenges of limited query access, partial information, and other practical issues to break the Google Cloud Vision API.

Cite this Paper


BibTeX
@InProceedings{pmlr-v80-ilyas18a, title = {Black-box Adversarial Attacks with Limited Queries and Information}, author = {Ilyas, Andrew and Engstrom, Logan and Athalye, Anish and Lin, Jessy}, booktitle = {Proceedings of the 35th International Conference on Machine Learning}, pages = {2137--2146}, year = {2018}, editor = {Dy, Jennifer and Krause, Andreas}, volume = {80}, series = {Proceedings of Machine Learning Research}, month = {10--15 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v80/ilyas18a/ilyas18a.pdf}, url = {https://proceedings.mlr.press/v80/ilyas18a.html}, abstract = {Current neural network-based classifiers are susceptible to adversarial examples even in the black-box setting, where the attacker only has query access to the model. In practice, the threat model for real-world systems is often more restrictive than the typical black-box model where the adversary can observe the full output of the network on arbitrarily many chosen inputs. We define three realistic threat models that more accurately characterize many real-world classifiers: the query-limited setting, the partial-information setting, and the label-only setting. We develop new attacks that fool classifiers under these more restrictive threat models, where previous methods would be impractical or ineffective. We demonstrate that our methods are effective against an ImageNet classifier under our proposed threat models. We also demonstrate a targeted black-box attack against a commercial classifier, overcoming the challenges of limited query access, partial information, and other practical issues to break the Google Cloud Vision API.} }
Endnote
%0 Conference Paper %T Black-box Adversarial Attacks with Limited Queries and Information %A Andrew Ilyas %A Logan Engstrom %A Anish Athalye %A Jessy Lin %B Proceedings of the 35th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2018 %E Jennifer Dy %E Andreas Krause %F pmlr-v80-ilyas18a %I PMLR %P 2137--2146 %U https://proceedings.mlr.press/v80/ilyas18a.html %V 80 %X Current neural network-based classifiers are susceptible to adversarial examples even in the black-box setting, where the attacker only has query access to the model. In practice, the threat model for real-world systems is often more restrictive than the typical black-box model where the adversary can observe the full output of the network on arbitrarily many chosen inputs. We define three realistic threat models that more accurately characterize many real-world classifiers: the query-limited setting, the partial-information setting, and the label-only setting. We develop new attacks that fool classifiers under these more restrictive threat models, where previous methods would be impractical or ineffective. We demonstrate that our methods are effective against an ImageNet classifier under our proposed threat models. We also demonstrate a targeted black-box attack against a commercial classifier, overcoming the challenges of limited query access, partial information, and other practical issues to break the Google Cloud Vision API.
APA
Ilyas, A., Engstrom, L., Athalye, A. & Lin, J.. (2018). Black-box Adversarial Attacks with Limited Queries and Information. Proceedings of the 35th International Conference on Machine Learning, in Proceedings of Machine Learning Research 80:2137-2146 Available from https://proceedings.mlr.press/v80/ilyas18a.html.

Related Material