Gaussian Margin Machines


Koby Crammer, Mehryar Mohri, Fernando Pereira ;
Proceedings of the Twelth International Conference on Artificial Intelligence and Statistics, PMLR 5:105-112, 2009.


We introduce Gaussian Margin Machines (GMMs), which maintain a Gaussian distribu- tion over weight vectors for binary classification. The learning algorithm for these machines seeks the least informative distribution that will classify the training data correctly with high probability. One formulation can be expressed as a convex constrained optimization problem whose solution can be represented linearly in terms of training instances and their inner and outer products, supporting kernelization. The algorithm has a natural PAC-Bayesian generalization bound. A preliminary evaluation on handwriting recognition data shows that our algorithm improves over SVMs for the same task. methods, we maintain a distribution over alternative weight vectors, rather than committing to a single specific one. However, these distributions are not derived by Bayes? rule. Instead, they represent our knowledge of the weights given constraints imposed by the training examples. Specifically, we use a Gaussian distribution over weight vectors with mean and covariance parameters that are learned from the training data. The learning algorithm seeks for a distribu- tion with a small Kullback-Leibler (KL) divergence from a fixed isotropic distribution, such that each training exam- ple is correctly classified by a strict majority of the weight vectors. Conceptually, this is a large-margin probabilistic principle, instead of the geometric large margin principle in SVMs. The learning problem for GMMs can be expressed as a convex constrained optimization, and its optimal solution

Related Material