Dissecting the Impact of Model Misspecification in Data-Driven Optimization

Adam N. Elmachtoub, Henry Lam, Haixiang Lan, Haofeng Zhang
Proceedings of The 28th International Conference on Artificial Intelligence and Statistics, PMLR 258:1594-1602, 2025.

Abstract

Data-driven optimization aims to translate a machine learning model into decision-making by optimizing decisions on estimated costs. Such a pipeline can be conducted by fitting a distributional model which is then plugged into the target optimization problem. While this fitting can utilize traditional methods such as maximum likelihood, a more recent approach uses estimation-optimization integration that minimizes decision error instead of estimation error. Although intuitive, the statistical benefit of the latter approach is not well understood yet is important to guide the prescriptive usage of machine learning. In this paper, we dissect the performance comparisons between these approaches in terms of the amount of model misspecification. In particular, we show how the integrated approach offers a “universal double benefit” on the top two dominating terms of regret when the underlying model is misspecified, while the traditional approach can be advantageous when the model is nearly well-specified. Our comparison is powered by finite-sample tail regret bounds that are derived via new higher-order expansions of regrets and the leveraging of a recent Berry-Esseen theorem.

Cite this Paper


BibTeX
@InProceedings{pmlr-v258-elmachtoub25a, title = {Dissecting the Impact of Model Misspecification in Data-Driven Optimization}, author = {Elmachtoub, Adam N. and Lam, Henry and Lan, Haixiang and Zhang, Haofeng}, booktitle = {Proceedings of The 28th International Conference on Artificial Intelligence and Statistics}, pages = {1594--1602}, year = {2025}, editor = {Li, Yingzhen and Mandt, Stephan and Agrawal, Shipra and Khan, Emtiyaz}, volume = {258}, series = {Proceedings of Machine Learning Research}, month = {03--05 May}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v258/main/assets/elmachtoub25a/elmachtoub25a.pdf}, url = {https://proceedings.mlr.press/v258/elmachtoub25a.html}, abstract = {Data-driven optimization aims to translate a machine learning model into decision-making by optimizing decisions on estimated costs. Such a pipeline can be conducted by fitting a distributional model which is then plugged into the target optimization problem. While this fitting can utilize traditional methods such as maximum likelihood, a more recent approach uses estimation-optimization integration that minimizes decision error instead of estimation error. Although intuitive, the statistical benefit of the latter approach is not well understood yet is important to guide the prescriptive usage of machine learning. In this paper, we dissect the performance comparisons between these approaches in terms of the amount of model misspecification. In particular, we show how the integrated approach offers a “universal double benefit” on the top two dominating terms of regret when the underlying model is misspecified, while the traditional approach can be advantageous when the model is nearly well-specified. Our comparison is powered by finite-sample tail regret bounds that are derived via new higher-order expansions of regrets and the leveraging of a recent Berry-Esseen theorem.} }
Endnote
%0 Conference Paper %T Dissecting the Impact of Model Misspecification in Data-Driven Optimization %A Adam N. Elmachtoub %A Henry Lam %A Haixiang Lan %A Haofeng Zhang %B Proceedings of The 28th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2025 %E Yingzhen Li %E Stephan Mandt %E Shipra Agrawal %E Emtiyaz Khan %F pmlr-v258-elmachtoub25a %I PMLR %P 1594--1602 %U https://proceedings.mlr.press/v258/elmachtoub25a.html %V 258 %X Data-driven optimization aims to translate a machine learning model into decision-making by optimizing decisions on estimated costs. Such a pipeline can be conducted by fitting a distributional model which is then plugged into the target optimization problem. While this fitting can utilize traditional methods such as maximum likelihood, a more recent approach uses estimation-optimization integration that minimizes decision error instead of estimation error. Although intuitive, the statistical benefit of the latter approach is not well understood yet is important to guide the prescriptive usage of machine learning. In this paper, we dissect the performance comparisons between these approaches in terms of the amount of model misspecification. In particular, we show how the integrated approach offers a “universal double benefit” on the top two dominating terms of regret when the underlying model is misspecified, while the traditional approach can be advantageous when the model is nearly well-specified. Our comparison is powered by finite-sample tail regret bounds that are derived via new higher-order expansions of regrets and the leveraging of a recent Berry-Esseen theorem.
APA
Elmachtoub, A.N., Lam, H., Lan, H. & Zhang, H.. (2025). Dissecting the Impact of Model Misspecification in Data-Driven Optimization. Proceedings of The 28th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 258:1594-1602 Available from https://proceedings.mlr.press/v258/elmachtoub25a.html.

Related Material