Regions of Reliability in the Evaluation of Multivariate Probabilistic Forecasts

Étienne Marcotte, Valentina Zantedeschi, Alexandre Drouin, Nicolas Chapados
Proceedings of the 40th International Conference on Machine Learning, PMLR 202:23958-24004, 2023.

Abstract

Multivariate probabilistic time series forecasts are commonly evaluated via proper scoring rules, i.e., functions that are minimal in expectation for the ground-truth distribution. However, this property is not sufficient to guarantee good discrimination in the non-asymptotic regime. In this paper, we provide the first systematic finite-sample study of proper scoring rules for time series forecasting evaluation. Through a power analysis, we identify the “region of reliability” of a scoring rule, i.e., the set of practical conditions where it can be relied on to identify forecasting errors. We carry out our analysis on a comprehensive synthetic benchmark, specifically designed to test several key discrepancies between ground-truth and forecast distributions, and we gauge the generalizability of our findings to real-world tasks with an application to an electricity production problem. Our results reveal critical shortcomings in the evaluation of multivariate probabilistic forecasts as commonly performed in the literature.

Cite this Paper


BibTeX
@InProceedings{pmlr-v202-marcotte23a, title = {Regions of Reliability in the Evaluation of Multivariate Probabilistic Forecasts}, author = {Marcotte, \'{E}tienne and Zantedeschi, Valentina and Drouin, Alexandre and Chapados, Nicolas}, booktitle = {Proceedings of the 40th International Conference on Machine Learning}, pages = {23958--24004}, year = {2023}, editor = {Krause, Andreas and Brunskill, Emma and Cho, Kyunghyun and Engelhardt, Barbara and Sabato, Sivan and Scarlett, Jonathan}, volume = {202}, series = {Proceedings of Machine Learning Research}, month = {23--29 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v202/marcotte23a/marcotte23a.pdf}, url = {https://proceedings.mlr.press/v202/marcotte23a.html}, abstract = {Multivariate probabilistic time series forecasts are commonly evaluated via proper scoring rules, i.e., functions that are minimal in expectation for the ground-truth distribution. However, this property is not sufficient to guarantee good discrimination in the non-asymptotic regime. In this paper, we provide the first systematic finite-sample study of proper scoring rules for time series forecasting evaluation. Through a power analysis, we identify the “region of reliability” of a scoring rule, i.e., the set of practical conditions where it can be relied on to identify forecasting errors. We carry out our analysis on a comprehensive synthetic benchmark, specifically designed to test several key discrepancies between ground-truth and forecast distributions, and we gauge the generalizability of our findings to real-world tasks with an application to an electricity production problem. Our results reveal critical shortcomings in the evaluation of multivariate probabilistic forecasts as commonly performed in the literature.} }
Endnote
%0 Conference Paper %T Regions of Reliability in the Evaluation of Multivariate Probabilistic Forecasts %A Étienne Marcotte %A Valentina Zantedeschi %A Alexandre Drouin %A Nicolas Chapados %B Proceedings of the 40th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2023 %E Andreas Krause %E Emma Brunskill %E Kyunghyun Cho %E Barbara Engelhardt %E Sivan Sabato %E Jonathan Scarlett %F pmlr-v202-marcotte23a %I PMLR %P 23958--24004 %U https://proceedings.mlr.press/v202/marcotte23a.html %V 202 %X Multivariate probabilistic time series forecasts are commonly evaluated via proper scoring rules, i.e., functions that are minimal in expectation for the ground-truth distribution. However, this property is not sufficient to guarantee good discrimination in the non-asymptotic regime. In this paper, we provide the first systematic finite-sample study of proper scoring rules for time series forecasting evaluation. Through a power analysis, we identify the “region of reliability” of a scoring rule, i.e., the set of practical conditions where it can be relied on to identify forecasting errors. We carry out our analysis on a comprehensive synthetic benchmark, specifically designed to test several key discrepancies between ground-truth and forecast distributions, and we gauge the generalizability of our findings to real-world tasks with an application to an electricity production problem. Our results reveal critical shortcomings in the evaluation of multivariate probabilistic forecasts as commonly performed in the literature.
APA
Marcotte, É., Zantedeschi, V., Drouin, A. & Chapados, N.. (2023). Regions of Reliability in the Evaluation of Multivariate Probabilistic Forecasts. Proceedings of the 40th International Conference on Machine Learning, in Proceedings of Machine Learning Research 202:23958-24004 Available from https://proceedings.mlr.press/v202/marcotte23a.html.

Related Material