Stochastic k-Neighborhood Selection for Supervised and Unsupervised Learning

Daniel Tarlow; Kevin Swersky; Laurent Charlin; Ilya Sutskever; Rich Zemel

Stochastic k-Neighborhood Selection for Supervised and Unsupervised Learning

Daniel Tarlow, Kevin Swersky, Laurent Charlin, Ilya Sutskever, Rich Zemel

Proceedings of the 30th International Conference on Machine Learning, PMLR 28(3):199-207, 2013.

Abstract

Neighborhood Components Analysis (NCA) is a popular method for learning a distance metric to be used within a k-nearest neighbors (kNN) classifier. A key assumption built into the model is that each point stochastically selects a single neighbor, which makes the model well-justified only for kNN with k=1. However, kNN classifiers with k>1 are more robust and usually preferred in practice. Here we present kNCA, which generalizes NCA by learning distance metrics that are appropriate for kNN with arbitrary k. The main technical contribution is showing how to efficiently compute and optimize the expected accuracy of a kNN classifier. We apply similar ideas in an unsupervised setting to yield kSNE and ktSNE, generalizations of Stochastic Neighbor Embedding (SNE, tSNE) that operate on neighborhoods of size k, which provide an axis of control over embeddings that allow for more homogeneous and interpretable regions. Empirically, we show that kNCA often improves classification accuracy over state of the art methods, produces qualitative differences in the embeddings as k is varied, and is more robust with respect to label noise.

Cite this Paper

BibTeX


@InProceedings{pmlr-v28-tarlow13,
  title = 	 {Stochastic k-Neighborhood Selection for Supervised and Unsupervised Learning},
  author = 	 {Tarlow, Daniel and Swersky, Kevin and Charlin, Laurent and Sutskever, Ilya and Zemel, Rich},
  booktitle = 	 {Proceedings of the 30th International Conference on Machine Learning},
  pages = 	 {199--207},
  year = 	 {2013},
  editor = 	 {Dasgupta, Sanjoy and McAllester, David},
  volume = 	 {28},
  number =       {3},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {Atlanta, Georgia, USA},
  month = 	 {17--19 Jun},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v28/tarlow13.pdf},
  url = 	 {https://proceedings.mlr.press/v28/tarlow13.html},
  abstract = 	 {Neighborhood Components Analysis (NCA) is a popular method for  learning a distance metric to be used within a k-nearest neighbors  (kNN) classifier.    A key assumption built into the model is that each point  stochastically selects a single neighbor, which  makes the model well-justified only for kNN with k=1.  However, kNN classifiers with k>1 are more robust and usually   preferred in practice.     Here we present kNCA, which generalizes NCA by  learning distance metrics that are appropriate for  kNN with arbitrary k.  The main technical contribution is showing  how to efficiently compute and optimize the expected  accuracy of a kNN classifier.  We apply similar ideas in an unsupervised  setting to yield kSNE and ktSNE, generalizations of  Stochastic Neighbor Embedding (SNE, tSNE) that operate on  neighborhoods of size k, which provide an axis of control over  embeddings that allow for more homogeneous and interpretable regions.  Empirically, we show that kNCA often improves classification accuracy over  state of the art methods, produces qualitative  differences in the embeddings as k is varied, and is more robust with  respect to label noise. }
}

Endnote

%0 Conference Paper
%T Stochastic k-Neighborhood Selection for Supervised and Unsupervised Learning
%A Daniel Tarlow
%A Kevin Swersky
%A Laurent Charlin
%A Ilya Sutskever
%A Rich Zemel
%B Proceedings of the 30th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2013
%E Sanjoy Dasgupta
%E David McAllester	
%F pmlr-v28-tarlow13
%I PMLR
%P 199--207
%U https://proceedings.mlr.press/v28/tarlow13.html
%V 28
%N 3
%X Neighborhood Components Analysis (NCA) is a popular method for  learning a distance metric to be used within a k-nearest neighbors  (kNN) classifier.    A key assumption built into the model is that each point  stochastically selects a single neighbor, which  makes the model well-justified only for kNN with k=1.  However, kNN classifiers with k>1 are more robust and usually   preferred in practice.     Here we present kNCA, which generalizes NCA by  learning distance metrics that are appropriate for  kNN with arbitrary k.  The main technical contribution is showing  how to efficiently compute and optimize the expected  accuracy of a kNN classifier.  We apply similar ideas in an unsupervised  setting to yield kSNE and ktSNE, generalizations of  Stochastic Neighbor Embedding (SNE, tSNE) that operate on  neighborhoods of size k, which provide an axis of control over  embeddings that allow for more homogeneous and interpretable regions.  Empirically, we show that kNCA often improves classification accuracy over  state of the art methods, produces qualitative  differences in the embeddings as k is varied, and is more robust with  respect to label noise.

RIS


TY  - CPAPER
TI  - Stochastic k-Neighborhood Selection for Supervised and Unsupervised Learning
AU  - Daniel Tarlow
AU  - Kevin Swersky
AU  - Laurent Charlin
AU  - Ilya Sutskever
AU  - Rich Zemel
BT  - Proceedings of the 30th International Conference on Machine Learning
DA  - 2013/05/26
ED  - Sanjoy Dasgupta
ED  - David McAllester	
ID  - pmlr-v28-tarlow13
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 28
IS  - 3
SP  - 199
EP  - 207
L1  - http://proceedings.mlr.press/v28/tarlow13.pdf
UR  - https://proceedings.mlr.press/v28/tarlow13.html
AB  - Neighborhood Components Analysis (NCA) is a popular method for  learning a distance metric to be used within a k-nearest neighbors  (kNN) classifier.    A key assumption built into the model is that each point  stochastically selects a single neighbor, which  makes the model well-justified only for kNN with k=1.  However, kNN classifiers with k>1 are more robust and usually   preferred in practice.     Here we present kNCA, which generalizes NCA by  learning distance metrics that are appropriate for  kNN with arbitrary k.  The main technical contribution is showing  how to efficiently compute and optimize the expected  accuracy of a kNN classifier.  We apply similar ideas in an unsupervised  setting to yield kSNE and ktSNE, generalizations of  Stochastic Neighbor Embedding (SNE, tSNE) that operate on  neighborhoods of size k, which provide an axis of control over  embeddings that allow for more homogeneous and interpretable regions.  Empirically, we show that kNCA often improves classification accuracy over  state of the art methods, produces qualitative  differences in the embeddings as k is varied, and is more robust with  respect to label noise. 
ER  -

APA


Tarlow, D., Swersky, K., Charlin, L., Sutskever, I. & Zemel, R.. (2013). Stochastic k-Neighborhood Selection for Supervised and Unsupervised Learning. Proceedings of the 30th International Conference on Machine Learning, in Proceedings of Machine Learning Research 28(3):199-207 Available from https://proceedings.mlr.press/v28/tarlow13.html.

Stochastic k-Neighborhood Selection for Supervised and Unsupervised Learning

Abstract

Cite this Paper

Related Material