MixupE: Understanding and improving Mixup from directional derivative perspective

Yingtian Zou, Vikas Verma, Sarthak Mittal, Wai Hoh Tang, Hieu Pham, Juho Kannala, Yoshua Bengio, Arno Solin, Kenji Kawaguchi
Proceedings of the Thirty-Ninth Conference on Uncertainty in Artificial Intelligence, PMLR 216:2597-2607, 2023.

Abstract

Mixup is a popular data augmentation technique for training deep neural networks where additional samples are generated by linearly interpolating pairs of inputs and their labels. This technique is known to improve the generalization performance in many learning paradigms and applications. In this work, we first analyze Mixup and show that it implicitly regularizes infinitely many directional derivatives of all orders. Based on this new insight, we propose an improved version of Mixup, theoretically justified to deliver better generalization performance than the vanilla Mixup. To demonstrate the effectiveness of the proposed method, we conduct experiments across various domains such as images, tabular data, speech, and graphs. Our results show that the proposed method improves Mixup across multiple datasets using a variety of architectures, for instance, exhibiting an improvement over Mixup by 0.8% in ImageNet top-1 accuracy.

Cite this Paper


BibTeX
@InProceedings{pmlr-v216-zou23a, title = {{MixupE}: Understanding and improving Mixup from directional derivative perspective}, author = {Zou, Yingtian and Verma, Vikas and Mittal, Sarthak and Tang, Wai Hoh and Pham, Hieu and Kannala, Juho and Bengio, Yoshua and Solin, Arno and Kawaguchi, Kenji}, booktitle = {Proceedings of the Thirty-Ninth Conference on Uncertainty in Artificial Intelligence}, pages = {2597--2607}, year = {2023}, editor = {Evans, Robin J. and Shpitser, Ilya}, volume = {216}, series = {Proceedings of Machine Learning Research}, month = {31 Jul--04 Aug}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v216/zou23a/zou23a.pdf}, url = {https://proceedings.mlr.press/v216/zou23a.html}, abstract = {Mixup is a popular data augmentation technique for training deep neural networks where additional samples are generated by linearly interpolating pairs of inputs and their labels. This technique is known to improve the generalization performance in many learning paradigms and applications. In this work, we first analyze Mixup and show that it implicitly regularizes infinitely many directional derivatives of all orders. Based on this new insight, we propose an improved version of Mixup, theoretically justified to deliver better generalization performance than the vanilla Mixup. To demonstrate the effectiveness of the proposed method, we conduct experiments across various domains such as images, tabular data, speech, and graphs. Our results show that the proposed method improves Mixup across multiple datasets using a variety of architectures, for instance, exhibiting an improvement over Mixup by 0.8% in ImageNet top-1 accuracy.} }
Endnote
%0 Conference Paper %T MixupE: Understanding and improving Mixup from directional derivative perspective %A Yingtian Zou %A Vikas Verma %A Sarthak Mittal %A Wai Hoh Tang %A Hieu Pham %A Juho Kannala %A Yoshua Bengio %A Arno Solin %A Kenji Kawaguchi %B Proceedings of the Thirty-Ninth Conference on Uncertainty in Artificial Intelligence %C Proceedings of Machine Learning Research %D 2023 %E Robin J. Evans %E Ilya Shpitser %F pmlr-v216-zou23a %I PMLR %P 2597--2607 %U https://proceedings.mlr.press/v216/zou23a.html %V 216 %X Mixup is a popular data augmentation technique for training deep neural networks where additional samples are generated by linearly interpolating pairs of inputs and their labels. This technique is known to improve the generalization performance in many learning paradigms and applications. In this work, we first analyze Mixup and show that it implicitly regularizes infinitely many directional derivatives of all orders. Based on this new insight, we propose an improved version of Mixup, theoretically justified to deliver better generalization performance than the vanilla Mixup. To demonstrate the effectiveness of the proposed method, we conduct experiments across various domains such as images, tabular data, speech, and graphs. Our results show that the proposed method improves Mixup across multiple datasets using a variety of architectures, for instance, exhibiting an improvement over Mixup by 0.8% in ImageNet top-1 accuracy.
APA
Zou, Y., Verma, V., Mittal, S., Tang, W.H., Pham, H., Kannala, J., Bengio, Y., Solin, A. & Kawaguchi, K.. (2023). MixupE: Understanding and improving Mixup from directional derivative perspective. Proceedings of the Thirty-Ninth Conference on Uncertainty in Artificial Intelligence, in Proceedings of Machine Learning Research 216:2597-2607 Available from https://proceedings.mlr.press/v216/zou23a.html.

Related Material