Fast Distributionally Robust Learning with Variance-Reduced Min-Max Optimization

Yaodong Yu, Tianyi Lin, Eric V. Mazumdar, Michael Jordan
Proceedings of The 25th International Conference on Artificial Intelligence and Statistics, PMLR 151:1219-1250, 2022.

Abstract

Distributionally robust supervised learning (DRSL) is emerging as a key paradigm for building reliable machine learning systems for real-world applications–reflecting the need for classifiers and predictive models that are robust to the distribution shifts that arise from phenomena such as selection bias or nonstationarity. Existing algorithms for solving Wasserstein DRSL— one of the most popular DRSL frameworks based around robustness to perturbations in the Wasserstein distance—have serious limitations that limit their use in large-scale problems—in particular they involve solving complex subproblems and they fail to make use of stochastic gradients. We revisit Wasserstein DRSL through the lens of min-max optimization and derive scalable and efficiently implementable stochastic extra-gradient algorithms which provably achieve faster convergence rates than existing approaches. We demonstrate their effectiveness on synthetic and real data when compared to existing DRSL approaches. Key to our results is the use of variance reduction and random reshuffling to accelerate stochastic min-max optimization, the analysis of which may be of independent interest.

Cite this Paper


BibTeX
@InProceedings{pmlr-v151-yu22a, title = { Fast Distributionally Robust Learning with Variance-Reduced Min-Max Optimization }, author = {Yu, Yaodong and Lin, Tianyi and Mazumdar, Eric V. and Jordan, Michael}, booktitle = {Proceedings of The 25th International Conference on Artificial Intelligence and Statistics}, pages = {1219--1250}, year = {2022}, editor = {Camps-Valls, Gustau and Ruiz, Francisco J. R. and Valera, Isabel}, volume = {151}, series = {Proceedings of Machine Learning Research}, month = {28--30 Mar}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v151/yu22a/yu22a.pdf}, url = {https://proceedings.mlr.press/v151/yu22a.html}, abstract = { Distributionally robust supervised learning (DRSL) is emerging as a key paradigm for building reliable machine learning systems for real-world applications–reflecting the need for classifiers and predictive models that are robust to the distribution shifts that arise from phenomena such as selection bias or nonstationarity. Existing algorithms for solving Wasserstein DRSL— one of the most popular DRSL frameworks based around robustness to perturbations in the Wasserstein distance—have serious limitations that limit their use in large-scale problems—in particular they involve solving complex subproblems and they fail to make use of stochastic gradients. We revisit Wasserstein DRSL through the lens of min-max optimization and derive scalable and efficiently implementable stochastic extra-gradient algorithms which provably achieve faster convergence rates than existing approaches. We demonstrate their effectiveness on synthetic and real data when compared to existing DRSL approaches. Key to our results is the use of variance reduction and random reshuffling to accelerate stochastic min-max optimization, the analysis of which may be of independent interest. } }
Endnote
%0 Conference Paper %T Fast Distributionally Robust Learning with Variance-Reduced Min-Max Optimization %A Yaodong Yu %A Tianyi Lin %A Eric V. Mazumdar %A Michael Jordan %B Proceedings of The 25th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2022 %E Gustau Camps-Valls %E Francisco J. R. Ruiz %E Isabel Valera %F pmlr-v151-yu22a %I PMLR %P 1219--1250 %U https://proceedings.mlr.press/v151/yu22a.html %V 151 %X Distributionally robust supervised learning (DRSL) is emerging as a key paradigm for building reliable machine learning systems for real-world applications–reflecting the need for classifiers and predictive models that are robust to the distribution shifts that arise from phenomena such as selection bias or nonstationarity. Existing algorithms for solving Wasserstein DRSL— one of the most popular DRSL frameworks based around robustness to perturbations in the Wasserstein distance—have serious limitations that limit their use in large-scale problems—in particular they involve solving complex subproblems and they fail to make use of stochastic gradients. We revisit Wasserstein DRSL through the lens of min-max optimization and derive scalable and efficiently implementable stochastic extra-gradient algorithms which provably achieve faster convergence rates than existing approaches. We demonstrate their effectiveness on synthetic and real data when compared to existing DRSL approaches. Key to our results is the use of variance reduction and random reshuffling to accelerate stochastic min-max optimization, the analysis of which may be of independent interest.
APA
Yu, Y., Lin, T., Mazumdar, E.V. & Jordan, M.. (2022). Fast Distributionally Robust Learning with Variance-Reduced Min-Max Optimization . Proceedings of The 25th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 151:1219-1250 Available from https://proceedings.mlr.press/v151/yu22a.html.

Related Material