Selective Mixup Helps with Distribution Shifts, But Not (Only) because of Mixup

Damien Teney, Jindong Wang, Ehsan Abbasnejad
Proceedings of the 41st International Conference on Machine Learning, PMLR 235:47948-47964, 2024.

Abstract

Mixup is a highly successful technique to improve generalization by augmenting training data with combinations of random pairs. Selective mixup is a family of methods that apply mixup to specific pairs e.g. combining examples across classes or domains. Despite remarkable performance on benchmarks with distribution shifts, these methods are still poorly understood. We find that an overlooked aspect of selective mixup explains some of its success in a completely new light. The non-random selection of pairs affects the training distribution and improves generalization by means completely unrelated to the mixing. For example in binary classification, mixup across classes implicitly resamples the data to uniform class distribution - a classical solution to label shift. We verify empirically that this resampling explains some of the improvements reported in prior work. Theoretically, the effect relies on a “regression toward the mean”, an accidental property we find in several datasets. Outcomes. We now better understand why selective mixup works. This lets us predict a yet-unknown failure mode and conditions where the method is detrimental. We also use the equivalence with resampling to design better variants that combine mixing and resampling effects.

Cite this Paper


BibTeX
@InProceedings{pmlr-v235-teney24a, title = {Selective Mixup Helps with Distribution Shifts, But Not ({O}nly) because of Mixup}, author = {Teney, Damien and Wang, Jindong and Abbasnejad, Ehsan}, booktitle = {Proceedings of the 41st International Conference on Machine Learning}, pages = {47948--47964}, year = {2024}, editor = {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix}, volume = {235}, series = {Proceedings of Machine Learning Research}, month = {21--27 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v235/main/assets/teney24a/teney24a.pdf}, url = {https://proceedings.mlr.press/v235/teney24a.html}, abstract = {Mixup is a highly successful technique to improve generalization by augmenting training data with combinations of random pairs. Selective mixup is a family of methods that apply mixup to specific pairs e.g. combining examples across classes or domains. Despite remarkable performance on benchmarks with distribution shifts, these methods are still poorly understood. We find that an overlooked aspect of selective mixup explains some of its success in a completely new light. The non-random selection of pairs affects the training distribution and improves generalization by means completely unrelated to the mixing. For example in binary classification, mixup across classes implicitly resamples the data to uniform class distribution - a classical solution to label shift. We verify empirically that this resampling explains some of the improvements reported in prior work. Theoretically, the effect relies on a “regression toward the mean”, an accidental property we find in several datasets. Outcomes. We now better understand why selective mixup works. This lets us predict a yet-unknown failure mode and conditions where the method is detrimental. We also use the equivalence with resampling to design better variants that combine mixing and resampling effects.} }
Endnote
%0 Conference Paper %T Selective Mixup Helps with Distribution Shifts, But Not (Only) because of Mixup %A Damien Teney %A Jindong Wang %A Ehsan Abbasnejad %B Proceedings of the 41st International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2024 %E Ruslan Salakhutdinov %E Zico Kolter %E Katherine Heller %E Adrian Weller %E Nuria Oliver %E Jonathan Scarlett %E Felix Berkenkamp %F pmlr-v235-teney24a %I PMLR %P 47948--47964 %U https://proceedings.mlr.press/v235/teney24a.html %V 235 %X Mixup is a highly successful technique to improve generalization by augmenting training data with combinations of random pairs. Selective mixup is a family of methods that apply mixup to specific pairs e.g. combining examples across classes or domains. Despite remarkable performance on benchmarks with distribution shifts, these methods are still poorly understood. We find that an overlooked aspect of selective mixup explains some of its success in a completely new light. The non-random selection of pairs affects the training distribution and improves generalization by means completely unrelated to the mixing. For example in binary classification, mixup across classes implicitly resamples the data to uniform class distribution - a classical solution to label shift. We verify empirically that this resampling explains some of the improvements reported in prior work. Theoretically, the effect relies on a “regression toward the mean”, an accidental property we find in several datasets. Outcomes. We now better understand why selective mixup works. This lets us predict a yet-unknown failure mode and conditions where the method is detrimental. We also use the equivalence with resampling to design better variants that combine mixing and resampling effects.
APA
Teney, D., Wang, J. & Abbasnejad, E.. (2024). Selective Mixup Helps with Distribution Shifts, But Not (Only) because of Mixup. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:47948-47964 Available from https://proceedings.mlr.press/v235/teney24a.html.

Related Material