Investigating the effect of novel classes in semi-supervised learning

Alex Yuxuan Peng, Yun Sing Koh, Patricia Riddle, Bernhard Pfahringer
Proceedings of The Eleventh Asian Conference on Machine Learning, PMLR 101:615-630, 2019.

Abstract

Semi-supervised learning usually assumes the distribution of the unlabelled data to be the same as that of the labelled data. This assumption does not always hold in practice. We empirically show that unlabelled data containing novel examples and classes from outside the distribution of the labelled data can lead to a performance degradation for semi-supervised learning algorithms. We propose a 1-nearest-neighbour based method to assign a weight to each unlabelled example in order to reduce the negative effect of novel classes in unlabelled data. Experimental results on MNIST, Fashion-MNIST and CIFAR-10 datasets suggest that the negative effect of novel classes becomes statistically insignificant when the proposed method is applied. Using our proposed technique, models trained on unlabelled data with novel classes can achieve similar performance as ones trained on clean unlabelled data.

Cite this Paper


BibTeX
@InProceedings{pmlr-v101-peng19a, title = {Investigating the effect of novel classes in semi-supervised learning}, author = {Peng, Alex Yuxuan and Koh, Yun Sing and Riddle, Patricia and Pfahringer, Bernhard}, booktitle = {Proceedings of The Eleventh Asian Conference on Machine Learning}, pages = {615--630}, year = {2019}, editor = {Lee, Wee Sun and Suzuki, Taiji}, volume = {101}, series = {Proceedings of Machine Learning Research}, month = {17--19 Nov}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v101/peng19a/peng19a.pdf}, url = {https://proceedings.mlr.press/v101/peng19a.html}, abstract = {Semi-supervised learning usually assumes the distribution of the unlabelled data to be the same as that of the labelled data. This assumption does not always hold in practice. We empirically show that unlabelled data containing novel examples and classes from outside the distribution of the labelled data can lead to a performance degradation for semi-supervised learning algorithms. We propose a 1-nearest-neighbour based method to assign a weight to each unlabelled example in order to reduce the negative effect of novel classes in unlabelled data. Experimental results on MNIST, Fashion-MNIST and CIFAR-10 datasets suggest that the negative effect of novel classes becomes statistically insignificant when the proposed method is applied. Using our proposed technique, models trained on unlabelled data with novel classes can achieve similar performance as ones trained on clean unlabelled data.} }
Endnote
%0 Conference Paper %T Investigating the effect of novel classes in semi-supervised learning %A Alex Yuxuan Peng %A Yun Sing Koh %A Patricia Riddle %A Bernhard Pfahringer %B Proceedings of The Eleventh Asian Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2019 %E Wee Sun Lee %E Taiji Suzuki %F pmlr-v101-peng19a %I PMLR %P 615--630 %U https://proceedings.mlr.press/v101/peng19a.html %V 101 %X Semi-supervised learning usually assumes the distribution of the unlabelled data to be the same as that of the labelled data. This assumption does not always hold in practice. We empirically show that unlabelled data containing novel examples and classes from outside the distribution of the labelled data can lead to a performance degradation for semi-supervised learning algorithms. We propose a 1-nearest-neighbour based method to assign a weight to each unlabelled example in order to reduce the negative effect of novel classes in unlabelled data. Experimental results on MNIST, Fashion-MNIST and CIFAR-10 datasets suggest that the negative effect of novel classes becomes statistically insignificant when the proposed method is applied. Using our proposed technique, models trained on unlabelled data with novel classes can achieve similar performance as ones trained on clean unlabelled data.
APA
Peng, A.Y., Koh, Y.S., Riddle, P. & Pfahringer, B.. (2019). Investigating the effect of novel classes in semi-supervised learning. Proceedings of The Eleventh Asian Conference on Machine Learning, in Proceedings of Machine Learning Research 101:615-630 Available from https://proceedings.mlr.press/v101/peng19a.html.

Related Material