Learning to Augment with Feature Side-information

Amina Mollaysa, Alexandros Kalousis, Eric Bruno, Maurits Diephuis
Proceedings of The Eleventh Asian Conference on Machine Learning, PMLR 101:173-187, 2019.

Abstract

Neural networks typically need huge amounts of data to train in order to get reasonable generalizable results. A common approach is to artificially generate samples by using prior knowledge of the data properties or other relevant domain knowledge. However, if the assumptions on the data properties are not accurate or the domain knowledge is irrelevant to the task at hand, one may end up degenerating learning performance by using such augmented data in comparison to simply training on the limited available dataset. We propose a critical data augmentation method using feature side-information, which is obtained from domain knowledge and provides detailed information about features' intrinsic properties. Most importantly, we introduce an instance wise quality checking procedure on the augmented data. It filters out irrelevant or harmful augmented data prior to entering the model. We validated this approach on both synthetic and real-world datasets, specifically in a scenario where the data augmentation is done based on a task independent, unreliable source of information. The experiments show that the introduced critical data augmentation scheme helps avoid performance degeneration resulting from incorporating wrong augmented data.

Cite this Paper


BibTeX
@InProceedings{pmlr-v101-mollaysa19a, title = {Learning to Augment with Feature Side-information}, author = {Mollaysa, Amina and Kalousis, Alexandros and Bruno, Eric and Diephuis, Maurits}, booktitle = {Proceedings of The Eleventh Asian Conference on Machine Learning}, pages = {173--187}, year = {2019}, editor = {Lee, Wee Sun and Suzuki, Taiji}, volume = {101}, series = {Proceedings of Machine Learning Research}, month = {17--19 Nov}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v101/mollaysa19a/mollaysa19a.pdf}, url = {https://proceedings.mlr.press/v101/mollaysa19a.html}, abstract = {Neural networks typically need huge amounts of data to train in order to get reasonable generalizable results. A common approach is to artificially generate samples by using prior knowledge of the data properties or other relevant domain knowledge. However, if the assumptions on the data properties are not accurate or the domain knowledge is irrelevant to the task at hand, one may end up degenerating learning performance by using such augmented data in comparison to simply training on the limited available dataset. We propose a critical data augmentation method using feature side-information, which is obtained from domain knowledge and provides detailed information about features' intrinsic properties. Most importantly, we introduce an instance wise quality checking procedure on the augmented data. It filters out irrelevant or harmful augmented data prior to entering the model. We validated this approach on both synthetic and real-world datasets, specifically in a scenario where the data augmentation is done based on a task independent, unreliable source of information. The experiments show that the introduced critical data augmentation scheme helps avoid performance degeneration resulting from incorporating wrong augmented data. } }
Endnote
%0 Conference Paper %T Learning to Augment with Feature Side-information %A Amina Mollaysa %A Alexandros Kalousis %A Eric Bruno %A Maurits Diephuis %B Proceedings of The Eleventh Asian Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2019 %E Wee Sun Lee %E Taiji Suzuki %F pmlr-v101-mollaysa19a %I PMLR %P 173--187 %U https://proceedings.mlr.press/v101/mollaysa19a.html %V 101 %X Neural networks typically need huge amounts of data to train in order to get reasonable generalizable results. A common approach is to artificially generate samples by using prior knowledge of the data properties or other relevant domain knowledge. However, if the assumptions on the data properties are not accurate or the domain knowledge is irrelevant to the task at hand, one may end up degenerating learning performance by using such augmented data in comparison to simply training on the limited available dataset. We propose a critical data augmentation method using feature side-information, which is obtained from domain knowledge and provides detailed information about features' intrinsic properties. Most importantly, we introduce an instance wise quality checking procedure on the augmented data. It filters out irrelevant or harmful augmented data prior to entering the model. We validated this approach on both synthetic and real-world datasets, specifically in a scenario where the data augmentation is done based on a task independent, unreliable source of information. The experiments show that the introduced critical data augmentation scheme helps avoid performance degeneration resulting from incorporating wrong augmented data.
APA
Mollaysa, A., Kalousis, A., Bruno, E. & Diephuis, M.. (2019). Learning to Augment with Feature Side-information. Proceedings of The Eleventh Asian Conference on Machine Learning, in Proceedings of Machine Learning Research 101:173-187 Available from https://proceedings.mlr.press/v101/mollaysa19a.html.

Related Material