Joining datasets via data augmentation in the label space for neural networks

Junbo Zhao, Mingfeng Ou, Linji Xue, Yunkai Cui, Sai Wu, Gang Chen
Proceedings of the 38th International Conference on Machine Learning, PMLR 139:12686-12696, 2021.

Abstract

Most, if not all, modern deep learning systems restrict themselves to a single dataset for neural network training and inference. In this article, we are interested in systematic ways to join datasets that are made of similar purposes. Unlike previous published works that ubiquitously conduct the dataset joining in the uninterpretable latent vectorial space, the core to our method is an augmentation procedure in the label space. The primary challenge to address the label space for dataset joining is the discrepancy between labels: non-overlapping label annotation sets, different labeling granularity or hierarchy and etc. Notably we propose a new technique leveraging artificially created knowledge graph, recurrent neural networks and policy gradient that successfully achieve the dataset joining in the label space. Empirical results on both image and text classification justify the validity of our approach.

Cite this Paper


BibTeX
@InProceedings{pmlr-v139-zhao21b, title = {Joining datasets via data augmentation in the label space for neural networks}, author = {Zhao, Junbo and Ou, Mingfeng and Xue, Linji and Cui, Yunkai and Wu, Sai and Chen, Gang}, booktitle = {Proceedings of the 38th International Conference on Machine Learning}, pages = {12686--12696}, year = {2021}, editor = {Meila, Marina and Zhang, Tong}, volume = {139}, series = {Proceedings of Machine Learning Research}, month = {18--24 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v139/zhao21b/zhao21b.pdf}, url = {https://proceedings.mlr.press/v139/zhao21b.html}, abstract = {Most, if not all, modern deep learning systems restrict themselves to a single dataset for neural network training and inference. In this article, we are interested in systematic ways to join datasets that are made of similar purposes. Unlike previous published works that ubiquitously conduct the dataset joining in the uninterpretable latent vectorial space, the core to our method is an augmentation procedure in the label space. The primary challenge to address the label space for dataset joining is the discrepancy between labels: non-overlapping label annotation sets, different labeling granularity or hierarchy and etc. Notably we propose a new technique leveraging artificially created knowledge graph, recurrent neural networks and policy gradient that successfully achieve the dataset joining in the label space. Empirical results on both image and text classification justify the validity of our approach.} }
Endnote
%0 Conference Paper %T Joining datasets via data augmentation in the label space for neural networks %A Junbo Zhao %A Mingfeng Ou %A Linji Xue %A Yunkai Cui %A Sai Wu %A Gang Chen %B Proceedings of the 38th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2021 %E Marina Meila %E Tong Zhang %F pmlr-v139-zhao21b %I PMLR %P 12686--12696 %U https://proceedings.mlr.press/v139/zhao21b.html %V 139 %X Most, if not all, modern deep learning systems restrict themselves to a single dataset for neural network training and inference. In this article, we are interested in systematic ways to join datasets that are made of similar purposes. Unlike previous published works that ubiquitously conduct the dataset joining in the uninterpretable latent vectorial space, the core to our method is an augmentation procedure in the label space. The primary challenge to address the label space for dataset joining is the discrepancy between labels: non-overlapping label annotation sets, different labeling granularity or hierarchy and etc. Notably we propose a new technique leveraging artificially created knowledge graph, recurrent neural networks and policy gradient that successfully achieve the dataset joining in the label space. Empirical results on both image and text classification justify the validity of our approach.
APA
Zhao, J., Ou, M., Xue, L., Cui, Y., Wu, S. & Chen, G.. (2021). Joining datasets via data augmentation in the label space for neural networks. Proceedings of the 38th International Conference on Machine Learning, in Proceedings of Machine Learning Research 139:12686-12696 Available from https://proceedings.mlr.press/v139/zhao21b.html.

Related Material