Multiply Robust Estimation for Local Distribution Shifts with Multiple Domains

Steven Wilkins-Reeves, Xu Chen, Qi Ma, Christine Agarwal, Aude Hofleitner
Proceedings of the 41st International Conference on Machine Learning, PMLR 235:52972-52993, 2024.

Abstract

Distribution shifts are ubiquitous in real-world machine learning applications, posing a challenge to the generalization of models trained on one data distribution to another. We focus on scenarios where data distributions vary across multiple segments of the entire population and only make local assumptions about the differences between training and test (deployment) distributions within each segment. We propose a two-stage multiply robust estimation method to improve model performance on each individual segment for tabular data analysis. The method involves fitting a linear combination of the based models, learned using clusters of training data from multiple segments, followed by a refinement step for each segment. Our method is designed to be implemented with commonly used off-the-shelf machine learning models. We establish theoretical guarantees on the generalization bound of the method on the test risk. With extensive experiments on synthetic and real datasets, we demonstrate that the proposed method substantially improves over existing alternatives in prediction accuracy and robustness on both regression and classification tasks. We also assess its effectiveness on a user city prediction dataset from Meta.

Cite this Paper


BibTeX
@InProceedings{pmlr-v235-wilkins-reeves24a, title = {Multiply Robust Estimation for Local Distribution Shifts with Multiple Domains}, author = {Wilkins-Reeves, Steven and Chen, Xu and Ma, Qi and Agarwal, Christine and Hofleitner, Aude}, booktitle = {Proceedings of the 41st International Conference on Machine Learning}, pages = {52972--52993}, year = {2024}, editor = {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix}, volume = {235}, series = {Proceedings of Machine Learning Research}, month = {21--27 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v235/main/assets/wilkins-reeves24a/wilkins-reeves24a.pdf}, url = {https://proceedings.mlr.press/v235/wilkins-reeves24a.html}, abstract = {Distribution shifts are ubiquitous in real-world machine learning applications, posing a challenge to the generalization of models trained on one data distribution to another. We focus on scenarios where data distributions vary across multiple segments of the entire population and only make local assumptions about the differences between training and test (deployment) distributions within each segment. We propose a two-stage multiply robust estimation method to improve model performance on each individual segment for tabular data analysis. The method involves fitting a linear combination of the based models, learned using clusters of training data from multiple segments, followed by a refinement step for each segment. Our method is designed to be implemented with commonly used off-the-shelf machine learning models. We establish theoretical guarantees on the generalization bound of the method on the test risk. With extensive experiments on synthetic and real datasets, we demonstrate that the proposed method substantially improves over existing alternatives in prediction accuracy and robustness on both regression and classification tasks. We also assess its effectiveness on a user city prediction dataset from Meta.} }
Endnote
%0 Conference Paper %T Multiply Robust Estimation for Local Distribution Shifts with Multiple Domains %A Steven Wilkins-Reeves %A Xu Chen %A Qi Ma %A Christine Agarwal %A Aude Hofleitner %B Proceedings of the 41st International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2024 %E Ruslan Salakhutdinov %E Zico Kolter %E Katherine Heller %E Adrian Weller %E Nuria Oliver %E Jonathan Scarlett %E Felix Berkenkamp %F pmlr-v235-wilkins-reeves24a %I PMLR %P 52972--52993 %U https://proceedings.mlr.press/v235/wilkins-reeves24a.html %V 235 %X Distribution shifts are ubiquitous in real-world machine learning applications, posing a challenge to the generalization of models trained on one data distribution to another. We focus on scenarios where data distributions vary across multiple segments of the entire population and only make local assumptions about the differences between training and test (deployment) distributions within each segment. We propose a two-stage multiply robust estimation method to improve model performance on each individual segment for tabular data analysis. The method involves fitting a linear combination of the based models, learned using clusters of training data from multiple segments, followed by a refinement step for each segment. Our method is designed to be implemented with commonly used off-the-shelf machine learning models. We establish theoretical guarantees on the generalization bound of the method on the test risk. With extensive experiments on synthetic and real datasets, we demonstrate that the proposed method substantially improves over existing alternatives in prediction accuracy and robustness on both regression and classification tasks. We also assess its effectiveness on a user city prediction dataset from Meta.
APA
Wilkins-Reeves, S., Chen, X., Ma, Q., Agarwal, C. & Hofleitner, A.. (2024). Multiply Robust Estimation for Local Distribution Shifts with Multiple Domains. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:52972-52993 Available from https://proceedings.mlr.press/v235/wilkins-reeves24a.html.

Related Material