A Gaussian Latent Variable Model for Large Margin Classification of Labeled and Unlabeled Data
Proceedings of the Seventeenth International Conference on Artificial Intelligence and Statistics, PMLR 33:484-492, 2014.
We investigate a Gaussian latent variable model for semi-supervised learning of linear large margin classifiers. The model’s latent variables encode the signed distance of examples to the separating hyperplane, and we constrain these variables, for both labeled and unlabeled examples, to ensure that the classes are separated by a large margin. Our approach is based on similar intuitions as semi-supervised support vector machines (S3VMs), but these intuitions are formalized in a probabilistic framework. Within this framework we are able to derive an especially simple Expectation-Maximization (EM) algorithm for learning. The algorithm alternates between applying Bayes rule to “fill in” the latent variables (the E-step) and performing an unconstrained least-squares regression to update the weight vector (the M-step). For the best results it is necessary to constrain the unlabeled data to have a similar ratio of positive to negative examples as the labeled data. Within our model this constraint renders exact inference intractable, but we show that a Lyapunov central limit theorem (for sums of independent, but non-identical random variables) provides an excellent approximation to the true posterior distribution. We perform experiments on large-scale text classification and find that our model significantly outperforms existing implementations of S3VMs.