Improving Vision Model Robustness against Misclassification and Uncertainty Attacks via Underconfidence Adversarial Training

Josué Martı́nez-Martı́nez, John T Holodnak, Olivia Brown, Sheida Nabavi, Derek Aguiar, Allan Wollaber
Proceedings of the 7th Northern Lights Deep Learning Conference (NLDL), PMLR 307:274-286, 2026.

Abstract

Adversarial robustness research has focused on defending against misclassification attacks. However, such adversarially trained models remain vulnerable to underconfidence adversarial attacks, which reduce the model s confidence without changing the predicted class. Decreased confidence can result in unnecessary interventions, delayed diagnoses, and a weakening of trust in automated systems. In this work, we introduce two novel underconfidence attacks: one that induces ambiguity between a class pair, and ConfSmooth which spreads uncertainty across all classes. For defense, we propose Underconfidence Adversarial Training (UAT) that embeds our underconfidence attacks in an adversarial training framework. We extensively benchmark our underconfidence attacks and defense strategies across six model architectures (both CNN and ViT-based), and seven datasets (MNIST, CIFAR, ImageNet, MSTAR and medical imaging). In 14 of the 15 data-architecture combinations, our attack outperforms the state-of-the-art, often substantially. Our UAT defense maintains the highest robustness against all underconfidence attacks on CIFAR-10, and achieves comparable to or better robustness than adversarial training against misclassification attacks while taking half of the gradient steps. By broadening the scope of adversarial robustness to include uncertainty-aware threats and defenses, UAT enables more robust computer vision systems.

Cite this Paper


BibTeX
@InProceedings{pmlr-v307-marti-nez-marti-nez26a, title = {Improving Vision Model Robustness against Misclassification and Uncertainty Attacks via Underconfidence Adversarial Training}, author = {Mart{\'\i}nez-Mart{\'\i}nez, Josu{\'e} and Holodnak, John T and Brown, Olivia and Nabavi, Sheida and Aguiar, Derek and Wollaber, Allan}, booktitle = {Proceedings of the 7th Northern Lights Deep Learning Conference (NLDL)}, pages = {274--286}, year = {2026}, editor = {Kim, Hyeongji and Ramírez Rivera, Adín and Ricaud, Benjamin}, volume = {307}, series = {Proceedings of Machine Learning Research}, month = {06--08 Jan}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v307/main/assets/marti-nez-marti-nez26a/marti-nez-marti-nez26a.pdf}, url = {https://proceedings.mlr.press/v307/marti-nez-marti-nez26a.html}, abstract = {Adversarial robustness research has focused on defending against misclassification attacks. However, such adversarially trained models remain vulnerable to underconfidence adversarial attacks, which reduce the model s confidence without changing the predicted class. Decreased confidence can result in unnecessary interventions, delayed diagnoses, and a weakening of trust in automated systems. In this work, we introduce two novel underconfidence attacks: one that induces ambiguity between a class pair, and ConfSmooth which spreads uncertainty across all classes. For defense, we propose Underconfidence Adversarial Training (UAT) that embeds our underconfidence attacks in an adversarial training framework. We extensively benchmark our underconfidence attacks and defense strategies across six model architectures (both CNN and ViT-based), and seven datasets (MNIST, CIFAR, ImageNet, MSTAR and medical imaging). In 14 of the 15 data-architecture combinations, our attack outperforms the state-of-the-art, often substantially. Our UAT defense maintains the highest robustness against all underconfidence attacks on CIFAR-10, and achieves comparable to or better robustness than adversarial training against misclassification attacks while taking half of the gradient steps. By broadening the scope of adversarial robustness to include uncertainty-aware threats and defenses, UAT enables more robust computer vision systems.} }
Endnote
%0 Conference Paper %T Improving Vision Model Robustness against Misclassification and Uncertainty Attacks via Underconfidence Adversarial Training %A Josué Martı́nez-Martı́nez %A John T Holodnak %A Olivia Brown %A Sheida Nabavi %A Derek Aguiar %A Allan Wollaber %B Proceedings of the 7th Northern Lights Deep Learning Conference (NLDL) %C Proceedings of Machine Learning Research %D 2026 %E Hyeongji Kim %E Adín Ramírez Rivera %E Benjamin Ricaud %F pmlr-v307-marti-nez-marti-nez26a %I PMLR %P 274--286 %U https://proceedings.mlr.press/v307/marti-nez-marti-nez26a.html %V 307 %X Adversarial robustness research has focused on defending against misclassification attacks. However, such adversarially trained models remain vulnerable to underconfidence adversarial attacks, which reduce the model s confidence without changing the predicted class. Decreased confidence can result in unnecessary interventions, delayed diagnoses, and a weakening of trust in automated systems. In this work, we introduce two novel underconfidence attacks: one that induces ambiguity between a class pair, and ConfSmooth which spreads uncertainty across all classes. For defense, we propose Underconfidence Adversarial Training (UAT) that embeds our underconfidence attacks in an adversarial training framework. We extensively benchmark our underconfidence attacks and defense strategies across six model architectures (both CNN and ViT-based), and seven datasets (MNIST, CIFAR, ImageNet, MSTAR and medical imaging). In 14 of the 15 data-architecture combinations, our attack outperforms the state-of-the-art, often substantially. Our UAT defense maintains the highest robustness against all underconfidence attacks on CIFAR-10, and achieves comparable to or better robustness than adversarial training against misclassification attacks while taking half of the gradient steps. By broadening the scope of adversarial robustness to include uncertainty-aware threats and defenses, UAT enables more robust computer vision systems.
APA
Martı́nez-Martı́nez, J., Holodnak, J.T., Brown, O., Nabavi, S., Aguiar, D. & Wollaber, A.. (2026). Improving Vision Model Robustness against Misclassification and Uncertainty Attacks via Underconfidence Adversarial Training. Proceedings of the 7th Northern Lights Deep Learning Conference (NLDL), in Proceedings of Machine Learning Research 307:274-286 Available from https://proceedings.mlr.press/v307/marti-nez-marti-nez26a.html.

Related Material