Towards Controlled Data Augmentations for Active Learning

Jianan Yang, Haobo Wang, Sai Wu, Gang Chen, Junbo Zhao
Proceedings of the 40th International Conference on Machine Learning, PMLR 202:39524-39542, 2023.

Abstract

The mission of active learning is to identify the most valuable data samples, thus attaining decent performance with much fewer samples. The data augmentation techniques seem straightforward yet promising to enhance active learning by extending the exploration of the input space, which helps locate more valuable samples. In this work, we thoroughly study the coupling of data augmentation and active learning, thereby proposing Controllable Augmentation ManiPulator for Active Learning. In contrast to the few prior works that touched on this line, CAMPAL emphasizes a purposeful, tighten, and better-controlled integration of data augmentation into active learning in three folds: (i)-carefully designed augmentation policies applied separately on labeled and unlabeled data pools; (ii)-controlled and quantifiably optimizable augmentation strengths; (iii)-full and flexible coverage for most (if not all) active learning schemes. Theories are proposed and associated with the development of key components in CAMPAL. Through extensive empirical experiments, we bring the performance of active learning methods to a new level: an absolute performance boost of 16.99% on CIFAR-10 and 12.25 on SVHN with 1,000 annotated samples. Codes are available at https://github.com/jnzju/CAMPAL.

Cite this Paper


BibTeX
@InProceedings{pmlr-v202-yang23p, title = {Towards Controlled Data Augmentations for Active Learning}, author = {Yang, Jianan and Wang, Haobo and Wu, Sai and Chen, Gang and Zhao, Junbo}, booktitle = {Proceedings of the 40th International Conference on Machine Learning}, pages = {39524--39542}, year = {2023}, editor = {Krause, Andreas and Brunskill, Emma and Cho, Kyunghyun and Engelhardt, Barbara and Sabato, Sivan and Scarlett, Jonathan}, volume = {202}, series = {Proceedings of Machine Learning Research}, month = {23--29 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v202/yang23p/yang23p.pdf}, url = {https://proceedings.mlr.press/v202/yang23p.html}, abstract = {The mission of active learning is to identify the most valuable data samples, thus attaining decent performance with much fewer samples. The data augmentation techniques seem straightforward yet promising to enhance active learning by extending the exploration of the input space, which helps locate more valuable samples. In this work, we thoroughly study the coupling of data augmentation and active learning, thereby proposing Controllable Augmentation ManiPulator for Active Learning. In contrast to the few prior works that touched on this line, CAMPAL emphasizes a purposeful, tighten, and better-controlled integration of data augmentation into active learning in three folds: (i)-carefully designed augmentation policies applied separately on labeled and unlabeled data pools; (ii)-controlled and quantifiably optimizable augmentation strengths; (iii)-full and flexible coverage for most (if not all) active learning schemes. Theories are proposed and associated with the development of key components in CAMPAL. Through extensive empirical experiments, we bring the performance of active learning methods to a new level: an absolute performance boost of 16.99% on CIFAR-10 and 12.25 on SVHN with 1,000 annotated samples. Codes are available at https://github.com/jnzju/CAMPAL.} }
Endnote
%0 Conference Paper %T Towards Controlled Data Augmentations for Active Learning %A Jianan Yang %A Haobo Wang %A Sai Wu %A Gang Chen %A Junbo Zhao %B Proceedings of the 40th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2023 %E Andreas Krause %E Emma Brunskill %E Kyunghyun Cho %E Barbara Engelhardt %E Sivan Sabato %E Jonathan Scarlett %F pmlr-v202-yang23p %I PMLR %P 39524--39542 %U https://proceedings.mlr.press/v202/yang23p.html %V 202 %X The mission of active learning is to identify the most valuable data samples, thus attaining decent performance with much fewer samples. The data augmentation techniques seem straightforward yet promising to enhance active learning by extending the exploration of the input space, which helps locate more valuable samples. In this work, we thoroughly study the coupling of data augmentation and active learning, thereby proposing Controllable Augmentation ManiPulator for Active Learning. In contrast to the few prior works that touched on this line, CAMPAL emphasizes a purposeful, tighten, and better-controlled integration of data augmentation into active learning in three folds: (i)-carefully designed augmentation policies applied separately on labeled and unlabeled data pools; (ii)-controlled and quantifiably optimizable augmentation strengths; (iii)-full and flexible coverage for most (if not all) active learning schemes. Theories are proposed and associated with the development of key components in CAMPAL. Through extensive empirical experiments, we bring the performance of active learning methods to a new level: an absolute performance boost of 16.99% on CIFAR-10 and 12.25 on SVHN with 1,000 annotated samples. Codes are available at https://github.com/jnzju/CAMPAL.
APA
Yang, J., Wang, H., Wu, S., Chen, G. & Zhao, J.. (2023). Towards Controlled Data Augmentations for Active Learning. Proceedings of the 40th International Conference on Machine Learning, in Proceedings of Machine Learning Research 202:39524-39542 Available from https://proceedings.mlr.press/v202/yang23p.html.

Related Material