Tied-Augment: Controlling Representation Similarity Improves Data Augmentation

Emirhan Kurtuluş; Zichao Li; Yann Dauphin; Ekin Dogus Cubuk

Tied-Augment: Controlling Representation Similarity Improves Data Augmentation

Emirhan Kurtuluş, Zichao Li, Yann Dauphin, Ekin Dogus Cubuk

Proceedings of the 40th International Conference on Machine Learning, PMLR 202:17994-18007, 2023.

Abstract

Data augmentation methods have played an important role in the recent advance of deep learning models, and have become an indispensable component of state-of-the-art models in semi-supervised, self-supervised, and supervised training for vision. Despite incurring no additional latency at test time, data augmentation often requires more epochs of training to be effective. For example, even the simple flips-and-crops augmentation requires training for more than 5 epochs to improve performance, whereas RandAugment requires more than 90 epochs. We propose a general framework called Tied-Augment, which improves the efficacy of data augmentation in a wide range of applications by adding a simple term to the loss that can control the similarity of representations under distortions. Tied-Augment can improve state-of-the-art methods from data augmentation (e.g. RandAugment, mixup), optimization (e.g. SAM), and semi-supervised learning (e.g. FixMatch). For example, Tied-RandAugment can outperform RandAugment by 2.0% on ImageNet. Notably, using Tied-Augment, data augmentation can be made to improve generalization even when training for a few epochs and when fine-tuning. We open source our code at https://github.com/ekurtulus/tied-augment/tree/main.

Cite this Paper

BibTeX


@InProceedings{pmlr-v202-kurtulus23a,
  title = 	 {Tied-Augment: Controlling Representation Similarity Improves Data Augmentation},
  author =       {Kurtulu\c{s}, Emirhan and Li, Zichao and Dauphin, Yann and Cubuk, Ekin Dogus},
  booktitle = 	 {Proceedings of the 40th International Conference on Machine Learning},
  pages = 	 {17994--18007},
  year = 	 {2023},
  editor = 	 {Krause, Andreas and Brunskill, Emma and Cho, Kyunghyun and Engelhardt, Barbara and Sabato, Sivan and Scarlett, Jonathan},
  volume = 	 {202},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {23--29 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v202/kurtulus23a/kurtulus23a.pdf},
  url = 	 {https://proceedings.mlr.press/v202/kurtulus23a.html},
  abstract = 	 {Data augmentation methods have played an important role in the recent advance of deep learning models, and have become an indispensable component of state-of-the-art models in semi-supervised, self-supervised, and supervised training for vision. Despite incurring no additional latency at test time, data augmentation often requires more epochs of training to be effective. For example, even the simple flips-and-crops augmentation requires training for more than 5 epochs to improve performance, whereas RandAugment requires more than 90 epochs. We propose a general framework called Tied-Augment, which improves the efficacy of data augmentation in a wide range of applications by adding a simple term to the loss that can control the similarity of representations under distortions. Tied-Augment can improve state-of-the-art methods from data augmentation (e.g. RandAugment, mixup), optimization (e.g. SAM), and semi-supervised learning (e.g. FixMatch). For example, Tied-RandAugment can outperform RandAugment by 2.0% on ImageNet. Notably, using Tied-Augment, data augmentation can be made to improve generalization even when training for a few epochs and when fine-tuning. We open source our code at https://github.com/ekurtulus/tied-augment/tree/main.}
}

Endnote

%0 Conference Paper
%T Tied-Augment: Controlling Representation Similarity Improves Data Augmentation
%A Emirhan Kurtuluş
%A Zichao Li
%A Yann Dauphin
%A Ekin Dogus Cubuk
%B Proceedings of the 40th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2023
%E Andreas Krause
%E Emma Brunskill
%E Kyunghyun Cho
%E Barbara Engelhardt
%E Sivan Sabato
%E Jonathan Scarlett	
%F pmlr-v202-kurtulus23a
%I PMLR
%P 17994--18007
%U https://proceedings.mlr.press/v202/kurtulus23a.html
%V 202
%X Data augmentation methods have played an important role in the recent advance of deep learning models, and have become an indispensable component of state-of-the-art models in semi-supervised, self-supervised, and supervised training for vision. Despite incurring no additional latency at test time, data augmentation often requires more epochs of training to be effective. For example, even the simple flips-and-crops augmentation requires training for more than 5 epochs to improve performance, whereas RandAugment requires more than 90 epochs. We propose a general framework called Tied-Augment, which improves the efficacy of data augmentation in a wide range of applications by adding a simple term to the loss that can control the similarity of representations under distortions. Tied-Augment can improve state-of-the-art methods from data augmentation (e.g. RandAugment, mixup), optimization (e.g. SAM), and semi-supervised learning (e.g. FixMatch). For example, Tied-RandAugment can outperform RandAugment by 2.0% on ImageNet. Notably, using Tied-Augment, data augmentation can be made to improve generalization even when training for a few epochs and when fine-tuning. We open source our code at https://github.com/ekurtulus/tied-augment/tree/main.

APA


Kurtuluş, E., Li, Z., Dauphin, Y. & Cubuk, E.D.. (2023). Tied-Augment: Controlling Representation Similarity Improves Data Augmentation. Proceedings of the 40th International Conference on Machine Learning, in Proceedings of Machine Learning Research 202:17994-18007 Available from https://proceedings.mlr.press/v202/kurtulus23a.html.

Tied-Augment: Controlling Representation Similarity Improves Data Augmentation

Abstract

Cite this Paper

Related Material