A Perfectly Truthful Calibration Measure

Jason Hartline; Lunjia Hu; Yifan Wu

A Perfectly Truthful Calibration Measure

Jason Hartline, Lunjia Hu, Yifan Wu

Proceedings of Thirty Ninth Conference on Learning Theory, PMLR 336:3185-3223, 2026.

Abstract

Calibration requires that predictions are conditionally unbiased and, therefore, reliably interpretable as probabilities. A calibration measure quantifies how far a predictor is from perfect calibration. A calibration measure is truthful if it is minimized in expectation when a predictor outputs the ground-truth probabilities. Predicting the true probabilities guarantees perfect calibration, but in reality, when calibration is evaluated on a random sample, all known calibration measures incentivize predictors to lie in order to appear more calibrated. This lack of truthfulness motivated approximately truthful calibration measures in the sequential prediction setting, but no perfectly truthful calibration measure was known to exist even in the more basic batch setting. We design a simple, perfectly and strictly truthful, sound, and complete calibration measure in the batch setting: Averaged Two-Bin Calibration Error (ATB). ATB is quadratically related to two existing calibration measures: the smooth calibration error and the lower distance to calibration. The simplicity of our definition of ATB makes it efficient and straightforward to compute, allowing us to give the first linear-time calibration testing algorithm. We also introduce a general recipe for constructing truthful measures based on the variance additivity of independent random variables, which proves the truthfulness of ATB as a special case and allows us to construct other truthful calibration measures, such as quantile-binned $\ell_2$ Expected Calibration Error (ECE).

Cite this Paper

BibTeX

@InProceedings{pmlr-v336-hartline26a,
  title = 	 {A Perfectly Truthful Calibration Measure},
  author =       {Hartline, Jason and Hu, Lunjia and Wu, Yifan},
  booktitle = 	 {Proceedings of Thirty Ninth Conference on Learning Theory},
  pages = 	 {3185--3223},
  year = 	 {2026},
  editor = 	 {Hanneke, Steve and Lattimore, Tor},
  volume = 	 {336},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {29 Jun--03 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v336/main/assets/hartline26a/hartline26a.pdf},
  url = 	 {https://proceedings.mlr.press/v336/hartline26a.html},
  abstract = 	 {Calibration requires that predictions are conditionally unbiased and, therefore, reliably interpretable as probabilities. A calibration measure quantifies how far a predictor is from perfect calibration. A calibration measure is truthful if it is minimized in expectation when a predictor outputs the ground-truth probabilities. Predicting the true probabilities guarantees perfect calibration, but in reality, when calibration is evaluated on a random sample, all known calibration measures incentivize predictors to lie in order to appear more calibrated. This lack of truthfulness motivated approximately truthful calibration measures in the sequential prediction setting, but no perfectly truthful calibration measure was known to exist even in the more basic batch setting. We design a simple, perfectly and strictly truthful, sound, and complete calibration measure in the batch setting: Averaged Two-Bin Calibration Error (ATB). ATB is quadratically related to two existing calibration measures: the smooth calibration error and the lower distance to calibration. The simplicity of our definition of ATB makes it efficient and straightforward to compute, allowing us to give the first linear-time calibration testing algorithm. We also introduce a general recipe for constructing truthful measures based on the variance additivity of independent random variables, which proves the truthfulness of ATB as a special case and allows us to construct other truthful calibration measures, such as quantile-binned $\ell_2$ Expected Calibration Error (ECE).}
}

Endnote

%0 Conference Paper
%T A Perfectly Truthful Calibration Measure
%A Jason Hartline
%A Lunjia Hu
%A Yifan Wu
%B Proceedings of Thirty Ninth Conference on Learning Theory
%C Proceedings of Machine Learning Research
%D 2026
%E Steve Hanneke
%E Tor Lattimore	
%F pmlr-v336-hartline26a
%I PMLR
%P 3185--3223
%U https://proceedings.mlr.press/v336/hartline26a.html
%V 336
%X Calibration requires that predictions are conditionally unbiased and, therefore, reliably interpretable as probabilities. A calibration measure quantifies how far a predictor is from perfect calibration. A calibration measure is truthful if it is minimized in expectation when a predictor outputs the ground-truth probabilities. Predicting the true probabilities guarantees perfect calibration, but in reality, when calibration is evaluated on a random sample, all known calibration measures incentivize predictors to lie in order to appear more calibrated. This lack of truthfulness motivated approximately truthful calibration measures in the sequential prediction setting, but no perfectly truthful calibration measure was known to exist even in the more basic batch setting. We design a simple, perfectly and strictly truthful, sound, and complete calibration measure in the batch setting: Averaged Two-Bin Calibration Error (ATB). ATB is quadratically related to two existing calibration measures: the smooth calibration error and the lower distance to calibration. The simplicity of our definition of ATB makes it efficient and straightforward to compute, allowing us to give the first linear-time calibration testing algorithm. We also introduce a general recipe for constructing truthful measures based on the variance additivity of independent random variables, which proves the truthfulness of ATB as a special case and allows us to construct other truthful calibration measures, such as quantile-binned $\ell_2$ Expected Calibration Error (ECE).

APA

Hartline, J., Hu, L. & Wu, Y.. (2026). A Perfectly Truthful Calibration Measure. Proceedings of Thirty Ninth Conference on Learning Theory, in Proceedings of Machine Learning Research 336:3185-3223 Available from https://proceedings.mlr.press/v336/hartline26a.html.

Related Material

Download PDF