On the Calibration of Probabilistic Classifier Sets

Thomas Mortier, Viktor Bengs, Eyke Hüllermeier, Stijn Luca, Willem Waegeman
Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, PMLR 206:8857-8870, 2023.

Abstract

Multi-class classification methods that produce sets of probabilistic classifiers, such as ensemble learning methods, are able to model aleatoric and epistemic uncertainty. Aleatoric uncertainty is then typically quantified via the Bayes error, and epistemic uncertainty via the size of the set. In this paper, we extend the notion of calibration, which is commonly used to evaluate the validity of the aleatoric uncertainty representation of a single probabilistic classifier, to assess the validity of an epistemic uncertainty representation obtained by sets of probabilistic classifiers. Broadly speaking, we call a set of probabilistic classifiers calibrated if one can find a calibrated convex combination of these classifiers. To evaluate this notion of calibration, we propose a novel nonparametric calibration test that generalizes an existing test for single probabilistic classifiers to the case of sets of probabilistic classifiers. Making use of this test, we empirically show that ensembles of deep neural networks are often not well calibrated.

Cite this Paper


BibTeX
@InProceedings{pmlr-v206-mortier23a, title = {On the Calibration of Probabilistic Classifier Sets}, author = {Mortier, Thomas and Bengs, Viktor and H\"ullermeier, Eyke and Luca, Stijn and Waegeman, Willem}, booktitle = {Proceedings of The 26th International Conference on Artificial Intelligence and Statistics}, pages = {8857--8870}, year = {2023}, editor = {Ruiz, Francisco and Dy, Jennifer and van de Meent, Jan-Willem}, volume = {206}, series = {Proceedings of Machine Learning Research}, month = {25--27 Apr}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v206/mortier23a/mortier23a.pdf}, url = {https://proceedings.mlr.press/v206/mortier23a.html}, abstract = {Multi-class classification methods that produce sets of probabilistic classifiers, such as ensemble learning methods, are able to model aleatoric and epistemic uncertainty. Aleatoric uncertainty is then typically quantified via the Bayes error, and epistemic uncertainty via the size of the set. In this paper, we extend the notion of calibration, which is commonly used to evaluate the validity of the aleatoric uncertainty representation of a single probabilistic classifier, to assess the validity of an epistemic uncertainty representation obtained by sets of probabilistic classifiers. Broadly speaking, we call a set of probabilistic classifiers calibrated if one can find a calibrated convex combination of these classifiers. To evaluate this notion of calibration, we propose a novel nonparametric calibration test that generalizes an existing test for single probabilistic classifiers to the case of sets of probabilistic classifiers. Making use of this test, we empirically show that ensembles of deep neural networks are often not well calibrated.} }
Endnote
%0 Conference Paper %T On the Calibration of Probabilistic Classifier Sets %A Thomas Mortier %A Viktor Bengs %A Eyke Hüllermeier %A Stijn Luca %A Willem Waegeman %B Proceedings of The 26th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2023 %E Francisco Ruiz %E Jennifer Dy %E Jan-Willem van de Meent %F pmlr-v206-mortier23a %I PMLR %P 8857--8870 %U https://proceedings.mlr.press/v206/mortier23a.html %V 206 %X Multi-class classification methods that produce sets of probabilistic classifiers, such as ensemble learning methods, are able to model aleatoric and epistemic uncertainty. Aleatoric uncertainty is then typically quantified via the Bayes error, and epistemic uncertainty via the size of the set. In this paper, we extend the notion of calibration, which is commonly used to evaluate the validity of the aleatoric uncertainty representation of a single probabilistic classifier, to assess the validity of an epistemic uncertainty representation obtained by sets of probabilistic classifiers. Broadly speaking, we call a set of probabilistic classifiers calibrated if one can find a calibrated convex combination of these classifiers. To evaluate this notion of calibration, we propose a novel nonparametric calibration test that generalizes an existing test for single probabilistic classifiers to the case of sets of probabilistic classifiers. Making use of this test, we empirically show that ensembles of deep neural networks are often not well calibrated.
APA
Mortier, T., Bengs, V., Hüllermeier, E., Luca, S. & Waegeman, W.. (2023). On the Calibration of Probabilistic Classifier Sets. Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 206:8857-8870 Available from https://proceedings.mlr.press/v206/mortier23a.html.

Related Material