Unsupervised Supervised Learning II: Margin-Based Classification without Labels

Krishnakumar Balasubramanian, Pinar Donmez, Guy Lebanon
Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, PMLR 15:137-145, 2011.

Abstract

Many popular linear classifiers, such as logistic regression, boosting, or SVM, are trained by optimizing margin-based risk functions. Traditionally, these risk functions are computed based on a labeled dataset. We develop a novel technique for estimating such risks using only unlabeled data and knowledge of $p(y)$. We prove that the proposed risk estimator is consistent on high-dimensional datasets and demonstrate it on synthetic and real-world data. In particular, we show how the estimate is used for evaluating classifiers in transfer learning, and for training classifiers using exclusively unlabeled data.

Cite this Paper


BibTeX
@InProceedings{pmlr-v15-balasubramanian11a, title = {Unsupervised Supervised Learning II: Margin-Based Classification without Labels}, author = {Balasubramanian, Krishnakumar and Donmez, Pinar and Lebanon, Guy}, booktitle = {Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics}, pages = {137--145}, year = {2011}, editor = {Gordon, Geoffrey and Dunson, David and Dudík, Miroslav}, volume = {15}, series = {Proceedings of Machine Learning Research}, address = {Fort Lauderdale, FL, USA}, month = {11--13 Apr}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v15/balasubramanian11a/balasubramanian11a.pdf}, url = {https://proceedings.mlr.press/v15/balasubramanian11a.html}, abstract = {Many popular linear classifiers, such as logistic regression, boosting, or SVM, are trained by optimizing margin-based risk functions. Traditionally, these risk functions are computed based on a labeled dataset. We develop a novel technique for estimating such risks using only unlabeled data and knowledge of $p(y)$. We prove that the proposed risk estimator is consistent on high-dimensional datasets and demonstrate it on synthetic and real-world data. In particular, we show how the estimate is used for evaluating classifiers in transfer learning, and for training classifiers using exclusively unlabeled data.} }
Endnote
%0 Conference Paper %T Unsupervised Supervised Learning II: Margin-Based Classification without Labels %A Krishnakumar Balasubramanian %A Pinar Donmez %A Guy Lebanon %B Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2011 %E Geoffrey Gordon %E David Dunson %E Miroslav Dudík %F pmlr-v15-balasubramanian11a %I PMLR %P 137--145 %U https://proceedings.mlr.press/v15/balasubramanian11a.html %V 15 %X Many popular linear classifiers, such as logistic regression, boosting, or SVM, are trained by optimizing margin-based risk functions. Traditionally, these risk functions are computed based on a labeled dataset. We develop a novel technique for estimating such risks using only unlabeled data and knowledge of $p(y)$. We prove that the proposed risk estimator is consistent on high-dimensional datasets and demonstrate it on synthetic and real-world data. In particular, we show how the estimate is used for evaluating classifiers in transfer learning, and for training classifiers using exclusively unlabeled data.
RIS
TY - CPAPER TI - Unsupervised Supervised Learning II: Margin-Based Classification without Labels AU - Krishnakumar Balasubramanian AU - Pinar Donmez AU - Guy Lebanon BT - Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics DA - 2011/06/14 ED - Geoffrey Gordon ED - David Dunson ED - Miroslav Dudík ID - pmlr-v15-balasubramanian11a PB - PMLR DP - Proceedings of Machine Learning Research VL - 15 SP - 137 EP - 145 L1 - http://proceedings.mlr.press/v15/balasubramanian11a/balasubramanian11a.pdf UR - https://proceedings.mlr.press/v15/balasubramanian11a.html AB - Many popular linear classifiers, such as logistic regression, boosting, or SVM, are trained by optimizing margin-based risk functions. Traditionally, these risk functions are computed based on a labeled dataset. We develop a novel technique for estimating such risks using only unlabeled data and knowledge of $p(y)$. We prove that the proposed risk estimator is consistent on high-dimensional datasets and demonstrate it on synthetic and real-world data. In particular, we show how the estimate is used for evaluating classifiers in transfer learning, and for training classifiers using exclusively unlabeled data. ER -
APA
Balasubramanian, K., Donmez, P. & Lebanon, G.. (2011). Unsupervised Supervised Learning II: Margin-Based Classification without Labels. Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 15:137-145 Available from https://proceedings.mlr.press/v15/balasubramanian11a.html.

Related Material