SPARDACUS SafetyCage: A new misclassification detector

Pål Vegard Johnsen, Filippo Remonato, Shawn Benedict, Albert Ndur-Osei
Proceedings of the 6th Northern Lights Deep Learning Conference (NLDL), PMLR 265:133-140, 2025.

Abstract

Given the increasing adoption of machine learning techniques in society and industry, it is important to put procedures in place that can infer and signal whether the prediction of an ML model may be unreliable. This is not only relevant for ML specialists, but also for laypersons who may be end-users. In this work, we present a new method for flagging possible misclassifications from a feed-forward neural network in a general multi-class problem, called SPARDA-enabled Classification Uncertainty Scorer (SPARDACUS). For each class and layer, the probability distribution functions of the activations for both correctly and wrongly classified samples are recorded. Using a Sparse Difference Analysis (SPARDA) approach, an optimal projection along the direction maximizing the Wasserstein distance enables p-value computations to confirm or reject the class prediction. Importantly, while most existing methods act on the output layer only, our method can in addition be applied on the hidden layers in the neural network, thus being useful in applications, such as feature extraction, that necessarily exploit the intermediate (hidden) layers. We test our method on both a well-performing and under-performing classifier, on different datasets, and compare with other previously published approaches. Notably, while achieving performance on par with two state-of-the-art-level methods, we significantly extend in flexibility and applicability. We further find, for the models and datasets chosen, that the output layer is indeed the most valuable for misclassification detection, and adding information from previous layers does not necessarily improve performance in such cases.

Cite this Paper


BibTeX
@InProceedings{pmlr-v265-johnsen25a, title = {{SPARDACUS} SafetyCage: A new misclassification detector}, author = {Johnsen, P{\r{a}}l Vegard and Remonato, Filippo and Benedict, Shawn and Ndur-Osei, Albert}, booktitle = {Proceedings of the 6th Northern Lights Deep Learning Conference (NLDL)}, pages = {133--140}, year = {2025}, editor = {Lutchyn, Tetiana and Ramírez Rivera, Adín and Ricaud, Benjamin}, volume = {265}, series = {Proceedings of Machine Learning Research}, month = {07--09 Jan}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v265/main/assets/johnsen25a/johnsen25a.pdf}, url = {https://proceedings.mlr.press/v265/johnsen25a.html}, abstract = {Given the increasing adoption of machine learning techniques in society and industry, it is important to put procedures in place that can infer and signal whether the prediction of an ML model may be unreliable. This is not only relevant for ML specialists, but also for laypersons who may be end-users. In this work, we present a new method for flagging possible misclassifications from a feed-forward neural network in a general multi-class problem, called SPARDA-enabled Classification Uncertainty Scorer (SPARDACUS). For each class and layer, the probability distribution functions of the activations for both correctly and wrongly classified samples are recorded. Using a Sparse Difference Analysis (SPARDA) approach, an optimal projection along the direction maximizing the Wasserstein distance enables $p$-value computations to confirm or reject the class prediction. Importantly, while most existing methods act on the output layer only, our method can in addition be applied on the hidden layers in the neural network, thus being useful in applications, such as feature extraction, that necessarily exploit the intermediate (hidden) layers. We test our method on both a well-performing and under-performing classifier, on different datasets, and compare with other previously published approaches. Notably, while achieving performance on par with two state-of-the-art-level methods, we significantly extend in flexibility and applicability. We further find, for the models and datasets chosen, that the output layer is indeed the most valuable for misclassification detection, and adding information from previous layers does not necessarily improve performance in such cases.} }
Endnote
%0 Conference Paper %T SPARDACUS SafetyCage: A new misclassification detector %A Pål Vegard Johnsen %A Filippo Remonato %A Shawn Benedict %A Albert Ndur-Osei %B Proceedings of the 6th Northern Lights Deep Learning Conference (NLDL) %C Proceedings of Machine Learning Research %D 2025 %E Tetiana Lutchyn %E Adín Ramírez Rivera %E Benjamin Ricaud %F pmlr-v265-johnsen25a %I PMLR %P 133--140 %U https://proceedings.mlr.press/v265/johnsen25a.html %V 265 %X Given the increasing adoption of machine learning techniques in society and industry, it is important to put procedures in place that can infer and signal whether the prediction of an ML model may be unreliable. This is not only relevant for ML specialists, but also for laypersons who may be end-users. In this work, we present a new method for flagging possible misclassifications from a feed-forward neural network in a general multi-class problem, called SPARDA-enabled Classification Uncertainty Scorer (SPARDACUS). For each class and layer, the probability distribution functions of the activations for both correctly and wrongly classified samples are recorded. Using a Sparse Difference Analysis (SPARDA) approach, an optimal projection along the direction maximizing the Wasserstein distance enables $p$-value computations to confirm or reject the class prediction. Importantly, while most existing methods act on the output layer only, our method can in addition be applied on the hidden layers in the neural network, thus being useful in applications, such as feature extraction, that necessarily exploit the intermediate (hidden) layers. We test our method on both a well-performing and under-performing classifier, on different datasets, and compare with other previously published approaches. Notably, while achieving performance on par with two state-of-the-art-level methods, we significantly extend in flexibility and applicability. We further find, for the models and datasets chosen, that the output layer is indeed the most valuable for misclassification detection, and adding information from previous layers does not necessarily improve performance in such cases.
APA
Johnsen, P.V., Remonato, F., Benedict, S. & Ndur-Osei, A.. (2025). SPARDACUS SafetyCage: A new misclassification detector. Proceedings of the 6th Northern Lights Deep Learning Conference (NLDL), in Proceedings of Machine Learning Research 265:133-140 Available from https://proceedings.mlr.press/v265/johnsen25a.html.

Related Material