Theoretical Guarantees of Learning Ensembling Strategies with Applications to Time Series Forecasting

Hilaf Hasson, Danielle C. Maddix, Bernie Wang, Gaurav Gupta, Youngsuk Park
Proceedings of the 40th International Conference on Machine Learning, PMLR 202:12616-12632, 2023.

Abstract

Ensembling is among the most popular tools in machine learning (ML) due to its effectiveness in minimizing variance and thus improving generalization. Most ensembling methods for black-box base learners fall under the umbrella of "stacked generalization," namely training an ML algorithm that takes the inferences from the base learners as input. While stacking has been widely applied in practice, its theoretical properties are poorly understood. In this paper, we prove a novel result, showing that choosing the best stacked generalization from a (finite or finite-dimensional) family of stacked generalizations based on cross-validated performance does not perform "much worse" than the oracle best. Our result strengthens and significantly extends the results in Van der Laan et al. (2007). Inspired by the theoretical analysis, we further propose a particular family of stacked generalizations in the context of probabilistic forecasting, each one with a different sensitivity for how much the ensemble weights are allowed to vary across items, timestamps in the forecast horizon, and quantiles. Experimental results demonstrate the performance gain of the proposed method.

Cite this Paper


BibTeX
@InProceedings{pmlr-v202-hasson23a, title = {Theoretical Guarantees of Learning Ensembling Strategies with Applications to Time Series Forecasting}, author = {Hasson, Hilaf and Maddix, Danielle C. and Wang, Bernie and Gupta, Gaurav and Park, Youngsuk}, booktitle = {Proceedings of the 40th International Conference on Machine Learning}, pages = {12616--12632}, year = {2023}, editor = {Krause, Andreas and Brunskill, Emma and Cho, Kyunghyun and Engelhardt, Barbara and Sabato, Sivan and Scarlett, Jonathan}, volume = {202}, series = {Proceedings of Machine Learning Research}, month = {23--29 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v202/hasson23a/hasson23a.pdf}, url = {https://proceedings.mlr.press/v202/hasson23a.html}, abstract = {Ensembling is among the most popular tools in machine learning (ML) due to its effectiveness in minimizing variance and thus improving generalization. Most ensembling methods for black-box base learners fall under the umbrella of "stacked generalization," namely training an ML algorithm that takes the inferences from the base learners as input. While stacking has been widely applied in practice, its theoretical properties are poorly understood. In this paper, we prove a novel result, showing that choosing the best stacked generalization from a (finite or finite-dimensional) family of stacked generalizations based on cross-validated performance does not perform "much worse" than the oracle best. Our result strengthens and significantly extends the results in Van der Laan et al. (2007). Inspired by the theoretical analysis, we further propose a particular family of stacked generalizations in the context of probabilistic forecasting, each one with a different sensitivity for how much the ensemble weights are allowed to vary across items, timestamps in the forecast horizon, and quantiles. Experimental results demonstrate the performance gain of the proposed method.} }
Endnote
%0 Conference Paper %T Theoretical Guarantees of Learning Ensembling Strategies with Applications to Time Series Forecasting %A Hilaf Hasson %A Danielle C. Maddix %A Bernie Wang %A Gaurav Gupta %A Youngsuk Park %B Proceedings of the 40th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2023 %E Andreas Krause %E Emma Brunskill %E Kyunghyun Cho %E Barbara Engelhardt %E Sivan Sabato %E Jonathan Scarlett %F pmlr-v202-hasson23a %I PMLR %P 12616--12632 %U https://proceedings.mlr.press/v202/hasson23a.html %V 202 %X Ensembling is among the most popular tools in machine learning (ML) due to its effectiveness in minimizing variance and thus improving generalization. Most ensembling methods for black-box base learners fall under the umbrella of "stacked generalization," namely training an ML algorithm that takes the inferences from the base learners as input. While stacking has been widely applied in practice, its theoretical properties are poorly understood. In this paper, we prove a novel result, showing that choosing the best stacked generalization from a (finite or finite-dimensional) family of stacked generalizations based on cross-validated performance does not perform "much worse" than the oracle best. Our result strengthens and significantly extends the results in Van der Laan et al. (2007). Inspired by the theoretical analysis, we further propose a particular family of stacked generalizations in the context of probabilistic forecasting, each one with a different sensitivity for how much the ensemble weights are allowed to vary across items, timestamps in the forecast horizon, and quantiles. Experimental results demonstrate the performance gain of the proposed method.
APA
Hasson, H., Maddix, D.C., Wang, B., Gupta, G. & Park, Y.. (2023). Theoretical Guarantees of Learning Ensembling Strategies with Applications to Time Series Forecasting. Proceedings of the 40th International Conference on Machine Learning, in Proceedings of Machine Learning Research 202:12616-12632 Available from https://proceedings.mlr.press/v202/hasson23a.html.

Related Material