Gaussian Margin Machines

Koby Crammer, Mehryar Mohri, Fernando Pereira
Proceedings of the Twelfth International Conference on Artificial Intelligence and Statistics, PMLR 5:105-112, 2009.

Abstract

We introduce Gaussian Margin Machines (GMMs), which maintain a Gaussian distribution over weight vectors for binary classification. The learning algorithm for these machines seeks the least informative distribution that will classify the training data correctly with high probability. One formulation can be expressed as a convex constrained optimization problem whose solution can be represented linearly in terms of training instances and their inner and outer products, supporting kernelization. The algorithm has a natural PAC-Bayesian generalization bound. A preliminary evaluation on handwriting recognition data shows that our algorithm improves over SVMs for the same task. methods, we maintain a distribution over alternative weight vectors, rather than committing to a single specific one. However, these distributions are not derived by Bayes? rule. Instead, they represent our knowledge of the weights given constraints imposed by the training examples. Specifically, we use a Gaussian distribution over weight vectors with mean and covariance parameters that are learned from the training data. The learning algorithm seeks for a distribution with a small Kullback-Leibler (KL) divergence from a fixed isotropic distribution, such that each training example is correctly classified by a strict majority of the weight vectors. Conceptually, this is a large-margin probabilistic principle, instead of the geometric large margin principle in SVMs. The learning problem for GMMs can be expressed as a convex constrained optimization, and its optimal solution

Cite this Paper


BibTeX
@InProceedings{pmlr-v5-crammer09a, title = {Gaussian Margin Machines}, author = {Crammer, Koby and Mohri, Mehryar and Pereira, Fernando}, booktitle = {Proceedings of the Twelfth International Conference on Artificial Intelligence and Statistics}, pages = {105--112}, year = {2009}, editor = {van Dyk, David and Welling, Max}, volume = {5}, series = {Proceedings of Machine Learning Research}, address = {Hilton Clearwater Beach Resort, Clearwater Beach, Florida USA}, month = {16--18 Apr}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v5/crammer09a/crammer09a.pdf}, url = {https://proceedings.mlr.press/v5/crammer09a.html}, abstract = {We introduce Gaussian Margin Machines (GMMs), which maintain a Gaussian distribution over weight vectors for binary classification. The learning algorithm for these machines seeks the least informative distribution that will classify the training data correctly with high probability. One formulation can be expressed as a convex constrained optimization problem whose solution can be represented linearly in terms of training instances and their inner and outer products, supporting kernelization. The algorithm has a natural PAC-Bayesian generalization bound. A preliminary evaluation on handwriting recognition data shows that our algorithm improves over SVMs for the same task. methods, we maintain a distribution over alternative weight vectors, rather than committing to a single specific one. However, these distributions are not derived by Bayes? rule. Instead, they represent our knowledge of the weights given constraints imposed by the training examples. Specifically, we use a Gaussian distribution over weight vectors with mean and covariance parameters that are learned from the training data. The learning algorithm seeks for a distribution with a small Kullback-Leibler (KL) divergence from a fixed isotropic distribution, such that each training example is correctly classified by a strict majority of the weight vectors. Conceptually, this is a large-margin probabilistic principle, instead of the geometric large margin principle in SVMs. The learning problem for GMMs can be expressed as a convex constrained optimization, and its optimal solution} }
Endnote
%0 Conference Paper %T Gaussian Margin Machines %A Koby Crammer %A Mehryar Mohri %A Fernando Pereira %B Proceedings of the Twelfth International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2009 %E David van Dyk %E Max Welling %F pmlr-v5-crammer09a %I PMLR %P 105--112 %U https://proceedings.mlr.press/v5/crammer09a.html %V 5 %X We introduce Gaussian Margin Machines (GMMs), which maintain a Gaussian distribution over weight vectors for binary classification. The learning algorithm for these machines seeks the least informative distribution that will classify the training data correctly with high probability. One formulation can be expressed as a convex constrained optimization problem whose solution can be represented linearly in terms of training instances and their inner and outer products, supporting kernelization. The algorithm has a natural PAC-Bayesian generalization bound. A preliminary evaluation on handwriting recognition data shows that our algorithm improves over SVMs for the same task. methods, we maintain a distribution over alternative weight vectors, rather than committing to a single specific one. However, these distributions are not derived by Bayes? rule. Instead, they represent our knowledge of the weights given constraints imposed by the training examples. Specifically, we use a Gaussian distribution over weight vectors with mean and covariance parameters that are learned from the training data. The learning algorithm seeks for a distribution with a small Kullback-Leibler (KL) divergence from a fixed isotropic distribution, such that each training example is correctly classified by a strict majority of the weight vectors. Conceptually, this is a large-margin probabilistic principle, instead of the geometric large margin principle in SVMs. The learning problem for GMMs can be expressed as a convex constrained optimization, and its optimal solution
RIS
TY - CPAPER TI - Gaussian Margin Machines AU - Koby Crammer AU - Mehryar Mohri AU - Fernando Pereira BT - Proceedings of the Twelfth International Conference on Artificial Intelligence and Statistics DA - 2009/04/15 ED - David van Dyk ED - Max Welling ID - pmlr-v5-crammer09a PB - PMLR DP - Proceedings of Machine Learning Research VL - 5 SP - 105 EP - 112 L1 - http://proceedings.mlr.press/v5/crammer09a/crammer09a.pdf UR - https://proceedings.mlr.press/v5/crammer09a.html AB - We introduce Gaussian Margin Machines (GMMs), which maintain a Gaussian distribution over weight vectors for binary classification. The learning algorithm for these machines seeks the least informative distribution that will classify the training data correctly with high probability. One formulation can be expressed as a convex constrained optimization problem whose solution can be represented linearly in terms of training instances and their inner and outer products, supporting kernelization. The algorithm has a natural PAC-Bayesian generalization bound. A preliminary evaluation on handwriting recognition data shows that our algorithm improves over SVMs for the same task. methods, we maintain a distribution over alternative weight vectors, rather than committing to a single specific one. However, these distributions are not derived by Bayes? rule. Instead, they represent our knowledge of the weights given constraints imposed by the training examples. Specifically, we use a Gaussian distribution over weight vectors with mean and covariance parameters that are learned from the training data. The learning algorithm seeks for a distribution with a small Kullback-Leibler (KL) divergence from a fixed isotropic distribution, such that each training example is correctly classified by a strict majority of the weight vectors. Conceptually, this is a large-margin probabilistic principle, instead of the geometric large margin principle in SVMs. The learning problem for GMMs can be expressed as a convex constrained optimization, and its optimal solution ER -
APA
Crammer, K., Mohri, M. & Pereira, F.. (2009). Gaussian Margin Machines. Proceedings of the Twelfth International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 5:105-112 Available from https://proceedings.mlr.press/v5/crammer09a.html.

Related Material