Sinkhorn Label Allocation: Semi-Supervised Classification via Annealed Self-Training

Kai Sheng Tai, Peter D Bailis, Gregory Valiant
Proceedings of the 38th International Conference on Machine Learning, PMLR 139:10065-10075, 2021.

Abstract

Self-training is a standard approach to semi-supervised learning where the learner’s own predictions on unlabeled data are used as supervision during training. In this paper, we reinterpret this label assignment process as an optimal transportation problem between examples and classes, wherein the cost of assigning an example to a class is mediated by the current predictions of the classifier. This formulation facilitates a practical annealing strategy for label assignment and allows for the inclusion of prior knowledge on class proportions via flexible upper bound constraints. The solutions to these assignment problems can be efficiently approximated using Sinkhorn iteration, thus enabling their use in the inner loop of standard stochastic optimization algorithms. We demonstrate the effectiveness of our algorithm on the CIFAR-10, CIFAR-100, and SVHN datasets in comparison with FixMatch, a state-of-the-art self-training algorithm.

Cite this Paper


BibTeX
@InProceedings{pmlr-v139-tai21a, title = {Sinkhorn Label Allocation: Semi-Supervised Classification via Annealed Self-Training}, author = {Tai, Kai Sheng and Bailis, Peter D and Valiant, Gregory}, booktitle = {Proceedings of the 38th International Conference on Machine Learning}, pages = {10065--10075}, year = {2021}, editor = {Meila, Marina and Zhang, Tong}, volume = {139}, series = {Proceedings of Machine Learning Research}, month = {18--24 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v139/tai21a/tai21a.pdf}, url = {https://proceedings.mlr.press/v139/tai21a.html}, abstract = {Self-training is a standard approach to semi-supervised learning where the learner’s own predictions on unlabeled data are used as supervision during training. In this paper, we reinterpret this label assignment process as an optimal transportation problem between examples and classes, wherein the cost of assigning an example to a class is mediated by the current predictions of the classifier. This formulation facilitates a practical annealing strategy for label assignment and allows for the inclusion of prior knowledge on class proportions via flexible upper bound constraints. The solutions to these assignment problems can be efficiently approximated using Sinkhorn iteration, thus enabling their use in the inner loop of standard stochastic optimization algorithms. We demonstrate the effectiveness of our algorithm on the CIFAR-10, CIFAR-100, and SVHN datasets in comparison with FixMatch, a state-of-the-art self-training algorithm.} }
Endnote
%0 Conference Paper %T Sinkhorn Label Allocation: Semi-Supervised Classification via Annealed Self-Training %A Kai Sheng Tai %A Peter D Bailis %A Gregory Valiant %B Proceedings of the 38th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2021 %E Marina Meila %E Tong Zhang %F pmlr-v139-tai21a %I PMLR %P 10065--10075 %U https://proceedings.mlr.press/v139/tai21a.html %V 139 %X Self-training is a standard approach to semi-supervised learning where the learner’s own predictions on unlabeled data are used as supervision during training. In this paper, we reinterpret this label assignment process as an optimal transportation problem between examples and classes, wherein the cost of assigning an example to a class is mediated by the current predictions of the classifier. This formulation facilitates a practical annealing strategy for label assignment and allows for the inclusion of prior knowledge on class proportions via flexible upper bound constraints. The solutions to these assignment problems can be efficiently approximated using Sinkhorn iteration, thus enabling their use in the inner loop of standard stochastic optimization algorithms. We demonstrate the effectiveness of our algorithm on the CIFAR-10, CIFAR-100, and SVHN datasets in comparison with FixMatch, a state-of-the-art self-training algorithm.
APA
Tai, K.S., Bailis, P.D. & Valiant, G.. (2021). Sinkhorn Label Allocation: Semi-Supervised Classification via Annealed Self-Training. Proceedings of the 38th International Conference on Machine Learning, in Proceedings of Machine Learning Research 139:10065-10075 Available from https://proceedings.mlr.press/v139/tai21a.html.

Related Material