Gaussian Margin Machines

Koby Crammer, Mehryar Mohri, Fernando Pereira
; Proceedings of the Twelth International Conference on Artificial Intelligence and Statistics, PMLR 5:105-112, 2009.

Abstract

We introduce Gaussian Margin Machines (GMMs), which maintain a Gaussian distribu- tion over weight vectors for binary classification. The learning algorithm for these machines seeks the least informative distribution that will classify the training data correctly with high probability. One formulation can be expressed as a convex constrained optimization problem whose solution can be represented linearly in terms of training instances and their inner and outer products, supporting kernelization. The algorithm has a natural PAC-Bayesian generalization bound. A preliminary evaluation on handwriting recognition data shows that our algorithm improves over SVMs for the same task. methods, we maintain a distribution over alternative weight vectors, rather than committing to a single specific one. However, these distributions are not derived by Bayes? rule. Instead, they represent our knowledge of the weights given constraints imposed by the training examples. Specifically, we use a Gaussian distribution over weight vectors with mean and covariance parameters that are learned from the training data. The learning algorithm seeks for a distribu- tion with a small Kullback-Leibler (KL) divergence from a fixed isotropic distribution, such that each training exam- ple is correctly classified by a strict majority of the weight vectors. Conceptually, this is a large-margin probabilistic principle, instead of the geometric large margin principle in SVMs. The learning problem for GMMs can be expressed as a convex constrained optimization, and its optimal solution

Cite this Paper


BibTeX
@InProceedings{pmlr-v5-crammer09a, title = {Gaussian Margin Machines}, author = {Koby Crammer and Mehryar Mohri and Fernando Pereira}, booktitle = {Proceedings of the Twelth International Conference on Artificial Intelligence and Statistics}, pages = {105--112}, year = {2009}, editor = {David van Dyk and Max Welling}, volume = {5}, series = {Proceedings of Machine Learning Research}, address = {Hilton Clearwater Beach Resort, Clearwater Beach, Florida USA}, month = {16--18 Apr}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v5/crammer09a/crammer09a.pdf}, url = {http://proceedings.mlr.press/v5/crammer09a.html}, abstract = {We introduce Gaussian Margin Machines (GMMs), which maintain a Gaussian distribu- tion over weight vectors for binary classification. The learning algorithm for these machines seeks the least informative distribution that will classify the training data correctly with high probability. One formulation can be expressed as a convex constrained optimization problem whose solution can be represented linearly in terms of training instances and their inner and outer products, supporting kernelization. The algorithm has a natural PAC-Bayesian generalization bound. A preliminary evaluation on handwriting recognition data shows that our algorithm improves over SVMs for the same task. methods, we maintain a distribution over alternative weight vectors, rather than committing to a single specific one. However, these distributions are not derived by Bayes? rule. Instead, they represent our knowledge of the weights given constraints imposed by the training examples. Specifically, we use a Gaussian distribution over weight vectors with mean and covariance parameters that are learned from the training data. The learning algorithm seeks for a distribu- tion with a small Kullback-Leibler (KL) divergence from a fixed isotropic distribution, such that each training exam- ple is correctly classified by a strict majority of the weight vectors. Conceptually, this is a large-margin probabilistic principle, instead of the geometric large margin principle in SVMs. The learning problem for GMMs can be expressed as a convex constrained optimization, and its optimal solution} }
Endnote
%0 Conference Paper %T Gaussian Margin Machines %A Koby Crammer %A Mehryar Mohri %A Fernando Pereira %B Proceedings of the Twelth International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2009 %E David van Dyk %E Max Welling %F pmlr-v5-crammer09a %I PMLR %J Proceedings of Machine Learning Research %P 105--112 %U http://proceedings.mlr.press %V 5 %W PMLR %X We introduce Gaussian Margin Machines (GMMs), which maintain a Gaussian distribu- tion over weight vectors for binary classification. The learning algorithm for these machines seeks the least informative distribution that will classify the training data correctly with high probability. One formulation can be expressed as a convex constrained optimization problem whose solution can be represented linearly in terms of training instances and their inner and outer products, supporting kernelization. The algorithm has a natural PAC-Bayesian generalization bound. A preliminary evaluation on handwriting recognition data shows that our algorithm improves over SVMs for the same task. methods, we maintain a distribution over alternative weight vectors, rather than committing to a single specific one. However, these distributions are not derived by Bayes? rule. Instead, they represent our knowledge of the weights given constraints imposed by the training examples. Specifically, we use a Gaussian distribution over weight vectors with mean and covariance parameters that are learned from the training data. The learning algorithm seeks for a distribu- tion with a small Kullback-Leibler (KL) divergence from a fixed isotropic distribution, such that each training exam- ple is correctly classified by a strict majority of the weight vectors. Conceptually, this is a large-margin probabilistic principle, instead of the geometric large margin principle in SVMs. The learning problem for GMMs can be expressed as a convex constrained optimization, and its optimal solution
RIS
TY - CPAPER TI - Gaussian Margin Machines AU - Koby Crammer AU - Mehryar Mohri AU - Fernando Pereira BT - Proceedings of the Twelth International Conference on Artificial Intelligence and Statistics PY - 2009/04/15 DA - 2009/04/15 ED - David van Dyk ED - Max Welling ID - pmlr-v5-crammer09a PB - PMLR SP - 105 DP - PMLR EP - 112 L1 - http://proceedings.mlr.press/v5/crammer09a/crammer09a.pdf UR - http://proceedings.mlr.press/v5/crammer09a.html AB - We introduce Gaussian Margin Machines (GMMs), which maintain a Gaussian distribu- tion over weight vectors for binary classification. The learning algorithm for these machines seeks the least informative distribution that will classify the training data correctly with high probability. One formulation can be expressed as a convex constrained optimization problem whose solution can be represented linearly in terms of training instances and their inner and outer products, supporting kernelization. The algorithm has a natural PAC-Bayesian generalization bound. A preliminary evaluation on handwriting recognition data shows that our algorithm improves over SVMs for the same task. methods, we maintain a distribution over alternative weight vectors, rather than committing to a single specific one. However, these distributions are not derived by Bayes? rule. Instead, they represent our knowledge of the weights given constraints imposed by the training examples. Specifically, we use a Gaussian distribution over weight vectors with mean and covariance parameters that are learned from the training data. The learning algorithm seeks for a distribu- tion with a small Kullback-Leibler (KL) divergence from a fixed isotropic distribution, such that each training exam- ple is correctly classified by a strict majority of the weight vectors. Conceptually, this is a large-margin probabilistic principle, instead of the geometric large margin principle in SVMs. The learning problem for GMMs can be expressed as a convex constrained optimization, and its optimal solution ER -
APA
Crammer, K., Mohri, M. & Pereira, F.. (2009). Gaussian Margin Machines. Proceedings of the Twelth International Conference on Artificial Intelligence and Statistics, in PMLR 5:105-112

Related Material