Does Data Augmentation Lead to Positive Margin?

Shashank Rajput, Zhili Feng, Zachary Charles, Po-Ling Loh, Dimitris Papailiopoulos
Proceedings of the 36th International Conference on Machine Learning, PMLR 97:5321-5330, 2019.

Abstract

Data augmentation (DA) is commonly used during model training, as it significantly improves test error and model robustness. DA artificially expands the training set by applying random noise, rotations, crops, or even adversarial perturbations to the input data. Although DA is widely used, its capacity to provably improve robustness is not fully understood. In this work, we analyze the robustness that DA begets by quantifying the margin that DA enforces on empirical risk minimizers. We first focus on linear separators, and then a class of nonlinear models whose labeling is constant within small convex hulls of data points. We present lower bounds on the number of augmented data points required for non-zero margin, and show that commonly used DA techniques may only introduce significant margin after adding exponentially many points to the data set.

Cite this Paper


BibTeX
@InProceedings{pmlr-v97-rajput19a, title = {Does Data Augmentation Lead to Positive Margin?}, author = {Rajput, Shashank and Feng, Zhili and Charles, Zachary and Loh, Po-Ling and Papailiopoulos, Dimitris}, booktitle = {Proceedings of the 36th International Conference on Machine Learning}, pages = {5321--5330}, year = {2019}, editor = {Kamalika Chaudhuri and Ruslan Salakhutdinov}, volume = {97}, series = {Proceedings of Machine Learning Research}, month = {09--15 Jun}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v97/rajput19a/rajput19a.pdf}, url = { http://proceedings.mlr.press/v97/rajput19a.html }, abstract = {Data augmentation (DA) is commonly used during model training, as it significantly improves test error and model robustness. DA artificially expands the training set by applying random noise, rotations, crops, or even adversarial perturbations to the input data. Although DA is widely used, its capacity to provably improve robustness is not fully understood. In this work, we analyze the robustness that DA begets by quantifying the margin that DA enforces on empirical risk minimizers. We first focus on linear separators, and then a class of nonlinear models whose labeling is constant within small convex hulls of data points. We present lower bounds on the number of augmented data points required for non-zero margin, and show that commonly used DA techniques may only introduce significant margin after adding exponentially many points to the data set.} }
Endnote
%0 Conference Paper %T Does Data Augmentation Lead to Positive Margin? %A Shashank Rajput %A Zhili Feng %A Zachary Charles %A Po-Ling Loh %A Dimitris Papailiopoulos %B Proceedings of the 36th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2019 %E Kamalika Chaudhuri %E Ruslan Salakhutdinov %F pmlr-v97-rajput19a %I PMLR %P 5321--5330 %U http://proceedings.mlr.press/v97/rajput19a.html %V 97 %X Data augmentation (DA) is commonly used during model training, as it significantly improves test error and model robustness. DA artificially expands the training set by applying random noise, rotations, crops, or even adversarial perturbations to the input data. Although DA is widely used, its capacity to provably improve robustness is not fully understood. In this work, we analyze the robustness that DA begets by quantifying the margin that DA enforces on empirical risk minimizers. We first focus on linear separators, and then a class of nonlinear models whose labeling is constant within small convex hulls of data points. We present lower bounds on the number of augmented data points required for non-zero margin, and show that commonly used DA techniques may only introduce significant margin after adding exponentially many points to the data set.
APA
Rajput, S., Feng, Z., Charles, Z., Loh, P. & Papailiopoulos, D.. (2019). Does Data Augmentation Lead to Positive Margin?. Proceedings of the 36th International Conference on Machine Learning, in Proceedings of Machine Learning Research 97:5321-5330 Available from http://proceedings.mlr.press/v97/rajput19a.html .

Related Material