Effective Semisupervised Learning on Manifolds

Amir Globerson, Roi Livni, Shai Shalev-Shwartz
Proceedings of the 2017 Conference on Learning Theory, PMLR 65:978-1003, 2017.

Abstract

The abundance of unlabeled data makes semi-supervised learning (SSL) an attractive approach for improving the accuracy of learning systems. However, we are still far from a complete theoretical understanding of the benefits of this learning scenario in terms of sample complexity. In particular, for many natural learning settings it can in fact be shown that SSL does not improve sample complexity. Thus far, the only case where SSL provably helps, without compatibility assumptions, is a recent combinatorial construction of Darnstadt et al. Deriving similar theoretical guarantees for more commonly used approaches to SSL remains a challenge. Here, we provide the first analysis of manifold based SSL, where there is a provable gap between supervised learning and SSL, and this gap can be arbitrarily large. Proving the required lower bound is a technical challenge, involving tools from geometric measure theory. The algorithm we analyse is similar to subspace clustering, and thus our results demonstrate that this method can be used to improve sample complexity.

Cite this Paper


BibTeX
@InProceedings{pmlr-v65-globerson17a, title = {Effective Semisupervised Learning on Manifolds}, author = {Globerson, Amir and Livni, Roi and Shalev-Shwartz, Shai}, booktitle = {Proceedings of the 2017 Conference on Learning Theory}, pages = {978--1003}, year = {2017}, editor = {Kale, Satyen and Shamir, Ohad}, volume = {65}, series = {Proceedings of Machine Learning Research}, month = {07--10 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v65/globerson17a/globerson17a.pdf}, url = {https://proceedings.mlr.press/v65/globerson17a.html}, abstract = {The abundance of unlabeled data makes semi-supervised learning (SSL) an attractive approach for improving the accuracy of learning systems. However, we are still far from a complete theoretical understanding of the benefits of this learning scenario in terms of sample complexity. In particular, for many natural learning settings it can in fact be shown that SSL does not improve sample complexity. Thus far, the only case where SSL provably helps, without compatibility assumptions, is a recent combinatorial construction of Darnstadt et al. Deriving similar theoretical guarantees for more commonly used approaches to SSL remains a challenge. Here, we provide the first analysis of manifold based SSL, where there is a provable gap between supervised learning and SSL, and this gap can be arbitrarily large. Proving the required lower bound is a technical challenge, involving tools from geometric measure theory. The algorithm we analyse is similar to subspace clustering, and thus our results demonstrate that this method can be used to improve sample complexity.} }
Endnote
%0 Conference Paper %T Effective Semisupervised Learning on Manifolds %A Amir Globerson %A Roi Livni %A Shai Shalev-Shwartz %B Proceedings of the 2017 Conference on Learning Theory %C Proceedings of Machine Learning Research %D 2017 %E Satyen Kale %E Ohad Shamir %F pmlr-v65-globerson17a %I PMLR %P 978--1003 %U https://proceedings.mlr.press/v65/globerson17a.html %V 65 %X The abundance of unlabeled data makes semi-supervised learning (SSL) an attractive approach for improving the accuracy of learning systems. However, we are still far from a complete theoretical understanding of the benefits of this learning scenario in terms of sample complexity. In particular, for many natural learning settings it can in fact be shown that SSL does not improve sample complexity. Thus far, the only case where SSL provably helps, without compatibility assumptions, is a recent combinatorial construction of Darnstadt et al. Deriving similar theoretical guarantees for more commonly used approaches to SSL remains a challenge. Here, we provide the first analysis of manifold based SSL, where there is a provable gap between supervised learning and SSL, and this gap can be arbitrarily large. Proving the required lower bound is a technical challenge, involving tools from geometric measure theory. The algorithm we analyse is similar to subspace clustering, and thus our results demonstrate that this method can be used to improve sample complexity.
APA
Globerson, A., Livni, R. & Shalev-Shwartz, S.. (2017). Effective Semisupervised Learning on Manifolds. Proceedings of the 2017 Conference on Learning Theory, in Proceedings of Machine Learning Research 65:978-1003 Available from https://proceedings.mlr.press/v65/globerson17a.html.

Related Material