Dash: Semi-Supervised Learning with Dynamic Thresholding

Yi Xu, Lei Shang, Jinxing Ye, Qi Qian, Yu-Feng Li, Baigui Sun, Hao Li, Rong Jin
Proceedings of the 38th International Conference on Machine Learning, PMLR 139:11525-11536, 2021.

Abstract

While semi-supervised learning (SSL) has received tremendous attentions in many machine learning tasks due to its successful use of unlabeled data, existing SSL algorithms use either all unlabeled examples or the unlabeled examples with a fixed high-confidence prediction during the training progress. However, it is possible that too many correct/wrong pseudo labeled examples are eliminated/selected. In this work we develop a simple yet powerful framework, whose key idea is to select a subset of training examples from the unlabeled data when performing existing SSL methods so that only the unlabeled examples with pseudo labels related to the labeled data will be used to train models. The selection is performed at each updating iteration by only keeping the examples whose losses are smaller than a given threshold that is dynamically adjusted through the iteration. Our proposed approach, Dash, enjoys its adaptivity in terms of unlabeled data selection and its theoretical guarantee. Specifically, we theoretically establish the convergence rate of Dash from the view of non-convex optimization. Finally, we empirically demonstrate the effectiveness of the proposed method in comparison with state-of-the-art over benchmarks.

Cite this Paper


BibTeX
@InProceedings{pmlr-v139-xu21e, title = {Dash: Semi-Supervised Learning with Dynamic Thresholding}, author = {Xu, Yi and Shang, Lei and Ye, Jinxing and Qian, Qi and Li, Yu-Feng and Sun, Baigui and Li, Hao and Jin, Rong}, booktitle = {Proceedings of the 38th International Conference on Machine Learning}, pages = {11525--11536}, year = {2021}, editor = {Meila, Marina and Zhang, Tong}, volume = {139}, series = {Proceedings of Machine Learning Research}, month = {18--24 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v139/xu21e/xu21e.pdf}, url = {https://proceedings.mlr.press/v139/xu21e.html}, abstract = {While semi-supervised learning (SSL) has received tremendous attentions in many machine learning tasks due to its successful use of unlabeled data, existing SSL algorithms use either all unlabeled examples or the unlabeled examples with a fixed high-confidence prediction during the training progress. However, it is possible that too many correct/wrong pseudo labeled examples are eliminated/selected. In this work we develop a simple yet powerful framework, whose key idea is to select a subset of training examples from the unlabeled data when performing existing SSL methods so that only the unlabeled examples with pseudo labels related to the labeled data will be used to train models. The selection is performed at each updating iteration by only keeping the examples whose losses are smaller than a given threshold that is dynamically adjusted through the iteration. Our proposed approach, Dash, enjoys its adaptivity in terms of unlabeled data selection and its theoretical guarantee. Specifically, we theoretically establish the convergence rate of Dash from the view of non-convex optimization. Finally, we empirically demonstrate the effectiveness of the proposed method in comparison with state-of-the-art over benchmarks.} }
Endnote
%0 Conference Paper %T Dash: Semi-Supervised Learning with Dynamic Thresholding %A Yi Xu %A Lei Shang %A Jinxing Ye %A Qi Qian %A Yu-Feng Li %A Baigui Sun %A Hao Li %A Rong Jin %B Proceedings of the 38th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2021 %E Marina Meila %E Tong Zhang %F pmlr-v139-xu21e %I PMLR %P 11525--11536 %U https://proceedings.mlr.press/v139/xu21e.html %V 139 %X While semi-supervised learning (SSL) has received tremendous attentions in many machine learning tasks due to its successful use of unlabeled data, existing SSL algorithms use either all unlabeled examples or the unlabeled examples with a fixed high-confidence prediction during the training progress. However, it is possible that too many correct/wrong pseudo labeled examples are eliminated/selected. In this work we develop a simple yet powerful framework, whose key idea is to select a subset of training examples from the unlabeled data when performing existing SSL methods so that only the unlabeled examples with pseudo labels related to the labeled data will be used to train models. The selection is performed at each updating iteration by only keeping the examples whose losses are smaller than a given threshold that is dynamically adjusted through the iteration. Our proposed approach, Dash, enjoys its adaptivity in terms of unlabeled data selection and its theoretical guarantee. Specifically, we theoretically establish the convergence rate of Dash from the view of non-convex optimization. Finally, we empirically demonstrate the effectiveness of the proposed method in comparison with state-of-the-art over benchmarks.
APA
Xu, Y., Shang, L., Ye, J., Qian, Q., Li, Y., Sun, B., Li, H. & Jin, R.. (2021). Dash: Semi-Supervised Learning with Dynamic Thresholding. Proceedings of the 38th International Conference on Machine Learning, in Proceedings of Machine Learning Research 139:11525-11536 Available from https://proceedings.mlr.press/v139/xu21e.html.

Related Material