Improving Adversarial Robustness via Joint Classification and Multiple Explicit Detection Classes

Sina Baharlouei, Fatemeh Sheikholeslami, Meisam Razaviyayn, Zico Kolter
Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, PMLR 206:11059-11078, 2023.

Abstract

This work concerns the development of deep networks that are certifiably robust to adversarial attacks. Joint robust classification-detection was recently introduced as a certified defense mechanism, where adversarial examples are either correctly classified or assigned to the “abstain” class. In this work, we show that such a provable framework can benefit by extension to networks with multiple explicit abstain classes, where the adversarial examples are adaptively assigned to those. We show that naïvely adding multiple abstain classes can lead to “model degeneracy”, then we propose a regularization approach and a training method to counter this degeneracy by promoting full use of the multiple abstain classes. Our experiments demonstrate that the proposed approach consistently achieves favorable standard vs. robust verified accuracy tradeoffs, outperforming state-of-the-art algorithms for various choices of number of abstain classes.

Cite this Paper


BibTeX
@InProceedings{pmlr-v206-baharlouei23a, title = {Improving Adversarial Robustness via Joint Classification and Multiple Explicit Detection Classes}, author = {Baharlouei, Sina and Sheikholeslami, Fatemeh and Razaviyayn, Meisam and Kolter, Zico}, booktitle = {Proceedings of The 26th International Conference on Artificial Intelligence and Statistics}, pages = {11059--11078}, year = {2023}, editor = {Ruiz, Francisco and Dy, Jennifer and van de Meent, Jan-Willem}, volume = {206}, series = {Proceedings of Machine Learning Research}, month = {25--27 Apr}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v206/baharlouei23a/baharlouei23a.pdf}, url = {https://proceedings.mlr.press/v206/baharlouei23a.html}, abstract = {This work concerns the development of deep networks that are certifiably robust to adversarial attacks. Joint robust classification-detection was recently introduced as a certified defense mechanism, where adversarial examples are either correctly classified or assigned to the “abstain” class. In this work, we show that such a provable framework can benefit by extension to networks with multiple explicit abstain classes, where the adversarial examples are adaptively assigned to those. We show that naïvely adding multiple abstain classes can lead to “model degeneracy”, then we propose a regularization approach and a training method to counter this degeneracy by promoting full use of the multiple abstain classes. Our experiments demonstrate that the proposed approach consistently achieves favorable standard vs. robust verified accuracy tradeoffs, outperforming state-of-the-art algorithms for various choices of number of abstain classes.} }
Endnote
%0 Conference Paper %T Improving Adversarial Robustness via Joint Classification and Multiple Explicit Detection Classes %A Sina Baharlouei %A Fatemeh Sheikholeslami %A Meisam Razaviyayn %A Zico Kolter %B Proceedings of The 26th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2023 %E Francisco Ruiz %E Jennifer Dy %E Jan-Willem van de Meent %F pmlr-v206-baharlouei23a %I PMLR %P 11059--11078 %U https://proceedings.mlr.press/v206/baharlouei23a.html %V 206 %X This work concerns the development of deep networks that are certifiably robust to adversarial attacks. Joint robust classification-detection was recently introduced as a certified defense mechanism, where adversarial examples are either correctly classified or assigned to the “abstain” class. In this work, we show that such a provable framework can benefit by extension to networks with multiple explicit abstain classes, where the adversarial examples are adaptively assigned to those. We show that naïvely adding multiple abstain classes can lead to “model degeneracy”, then we propose a regularization approach and a training method to counter this degeneracy by promoting full use of the multiple abstain classes. Our experiments demonstrate that the proposed approach consistently achieves favorable standard vs. robust verified accuracy tradeoffs, outperforming state-of-the-art algorithms for various choices of number of abstain classes.
APA
Baharlouei, S., Sheikholeslami, F., Razaviyayn, M. & Kolter, Z.. (2023). Improving Adversarial Robustness via Joint Classification and Multiple Explicit Detection Classes. Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 206:11059-11078 Available from https://proceedings.mlr.press/v206/baharlouei23a.html.

Related Material