Truthfulness of Decision-Theoretic Calibration Measures

Mingda Qiao, Eric Zhao
Proceedings of Thirty Eighth Conference on Learning Theory, PMLR 291:4686-4739, 2025.

Abstract

Calibration measures quantify how much a forecaster’s predictions violate calibration, which requires that forecasts are unbiased conditioning on the forecasted probabilities. Two important desiderata for a calibration measure are its decision-theoretic implications (i.e., downstream decision-makers that best respond to the forecasts are always no-regret) and its truthfulness (i.e., a forecaster approximately minimizes error by always reporting the true probabilities). Existing measures satisfy at most one of the properties, but not both. We introduce a new calibration measure termed subsampled step calibration, $\mathrm{StepCE}^{\mathrm{sub}}$, that is both decision-theoretic and truthful. In particular, on any product distribution, $\mathrm{StepCE}^{\mathrm{sub}}$ is truthful up to an $O(1)$ factor whereas prior decision-theoretic calibration measures suffer from an $e^{-\Omega(T)}$–$\Omega(\sqrt{T})$ truthfulness gap. Moreover, in any smoothed setting where the conditional probability of each event is perturbed by a noise of magnitude $c>0$, $\mathrm{StepCE}^{\mathrm{sub}}$ is truthful up to an $O(\sqrt{\log(1/c)})$ factor, while prior decision-theoretic measures have an $e^{-\Omega(T)}$–$\Omega(T^{1/3})$ truthfulness gap. We also prove a general impossibility result for truthful decision-theoretic forecasting: any complete and decision-theoretic calibration measure must be discontinuous and non-truthful in the non-smoothed setting.

Cite this Paper


BibTeX
@InProceedings{pmlr-v291-qiao25a, title = {Truthfulness of Decision-Theoretic Calibration Measures}, author = {Qiao, Mingda and Zhao, Eric}, booktitle = {Proceedings of Thirty Eighth Conference on Learning Theory}, pages = {4686--4739}, year = {2025}, editor = {Haghtalab, Nika and Moitra, Ankur}, volume = {291}, series = {Proceedings of Machine Learning Research}, month = {30 Jun--04 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v291/main/assets/qiao25a/qiao25a.pdf}, url = {https://proceedings.mlr.press/v291/qiao25a.html}, abstract = {Calibration measures quantify how much a forecaster’s predictions violate calibration, which requires that forecasts are unbiased conditioning on the forecasted probabilities. Two important desiderata for a calibration measure are its decision-theoretic implications (i.e., downstream decision-makers that best respond to the forecasts are always no-regret) and its truthfulness (i.e., a forecaster approximately minimizes error by always reporting the true probabilities). Existing measures satisfy at most one of the properties, but not both. We introduce a new calibration measure termed subsampled step calibration, $\mathrm{StepCE}^{\mathrm{sub}}$, that is both decision-theoretic and truthful. In particular, on any product distribution, $\mathrm{StepCE}^{\mathrm{sub}}$ is truthful up to an $O(1)$ factor whereas prior decision-theoretic calibration measures suffer from an $e^{-\Omega(T)}$–$\Omega(\sqrt{T})$ truthfulness gap. Moreover, in any smoothed setting where the conditional probability of each event is perturbed by a noise of magnitude $c>0$, $\mathrm{StepCE}^{\mathrm{sub}}$ is truthful up to an $O(\sqrt{\log(1/c)})$ factor, while prior decision-theoretic measures have an $e^{-\Omega(T)}$–$\Omega(T^{1/3})$ truthfulness gap. We also prove a general impossibility result for truthful decision-theoretic forecasting: any complete and decision-theoretic calibration measure must be discontinuous and non-truthful in the non-smoothed setting.} }
Endnote
%0 Conference Paper %T Truthfulness of Decision-Theoretic Calibration Measures %A Mingda Qiao %A Eric Zhao %B Proceedings of Thirty Eighth Conference on Learning Theory %C Proceedings of Machine Learning Research %D 2025 %E Nika Haghtalab %E Ankur Moitra %F pmlr-v291-qiao25a %I PMLR %P 4686--4739 %U https://proceedings.mlr.press/v291/qiao25a.html %V 291 %X Calibration measures quantify how much a forecaster’s predictions violate calibration, which requires that forecasts are unbiased conditioning on the forecasted probabilities. Two important desiderata for a calibration measure are its decision-theoretic implications (i.e., downstream decision-makers that best respond to the forecasts are always no-regret) and its truthfulness (i.e., a forecaster approximately minimizes error by always reporting the true probabilities). Existing measures satisfy at most one of the properties, but not both. We introduce a new calibration measure termed subsampled step calibration, $\mathrm{StepCE}^{\mathrm{sub}}$, that is both decision-theoretic and truthful. In particular, on any product distribution, $\mathrm{StepCE}^{\mathrm{sub}}$ is truthful up to an $O(1)$ factor whereas prior decision-theoretic calibration measures suffer from an $e^{-\Omega(T)}$–$\Omega(\sqrt{T})$ truthfulness gap. Moreover, in any smoothed setting where the conditional probability of each event is perturbed by a noise of magnitude $c>0$, $\mathrm{StepCE}^{\mathrm{sub}}$ is truthful up to an $O(\sqrt{\log(1/c)})$ factor, while prior decision-theoretic measures have an $e^{-\Omega(T)}$–$\Omega(T^{1/3})$ truthfulness gap. We also prove a general impossibility result for truthful decision-theoretic forecasting: any complete and decision-theoretic calibration measure must be discontinuous and non-truthful in the non-smoothed setting.
APA
Qiao, M. & Zhao, E.. (2025). Truthfulness of Decision-Theoretic Calibration Measures. Proceedings of Thirty Eighth Conference on Learning Theory, in Proceedings of Machine Learning Research 291:4686-4739 Available from https://proceedings.mlr.press/v291/qiao25a.html.

Related Material