A New Look at Nearest Neighbours: Identifying Benign Input Geometries via Random Projections

Ata Kaban

A New Look at Nearest Neighbours: Identifying Benign Input Geometries via Random Projections

Ata Kaban

Asian Conference on Machine Learning, PMLR 45:65-80, 2016.

Abstract

It is well known that in general, the nearest neighbour rule (NN) has sample complexity that is exponential in the input space dimension d when only smoothness is assumed on the label posterior function. Here we consider NN on randomly projected data, and we show that, if the input domain has a small "metric size", then the sample complexity becomes exponential in the metric entropy integral of the set of normalised chords of the input domain. This metric entropy integral measures the complexity of the input domain, and can be much smaller than d – for instance in cases when the data lies in a linear or a smooth nonlinear subspace of the ambient space, or when it has a sparse representation. We then show that the guarantees we obtain for the compressive NN also hold for the dataspace NN in bounded domains; thus the random projection takes the role of an analytic tool to identify benign structures under which NN learning is possible from a small sample size. Numerical simulations on data designed to have intrinsically low complexity confirm our theoretical findings, and display a striking agreement in the empirical performances of compressive NN and dataspace NN. This suggests that high dimensional data sets that have a low complexity underlying structure are well suited for computationally cheap compressive NN learning.

Cite this Paper

BibTeX


@InProceedings{pmlr-v45-Kaban15b,
  title = 	 {A New Look at Nearest Neighbours: Identifying Benign Input Geometries via Random Projections},
  author = 	 {Kaban, Ata},
  booktitle = 	 {Asian Conference on Machine Learning},
  pages = 	 {65--80},
  year = 	 {2016},
  editor = 	 {Holmes, Geoffrey and Liu, Tie-Yan},
  volume = 	 {45},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {Hong Kong},
  month = 	 {20--22 Nov},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v45/Kaban15b.pdf},
  url = 	 {https://proceedings.mlr.press/v45/Kaban15b.html},
  abstract = 	 {It is well known that in general, the nearest neighbour rule (NN) has sample complexity that is exponential in the input space dimension d when only smoothness is assumed on the label posterior function. Here we consider NN on randomly projected  data, and we show that, if the input domain has a small "metric size", then the sample complexity becomes exponential in the metric entropy integral of the set of normalised chords of the input domain. This metric entropy integral measures the complexity of the input domain, and can be much smaller than d – for instance in cases when the data lies in a  linear or a smooth nonlinear subspace of the ambient space, or when it has a sparse representation. We then show that the guarantees we obtain for the compressive NN also hold for the dataspace NN in bounded domains; thus the random projection takes the role of an analytic tool to identify benign structures under which NN learning is possible from a small sample size. Numerical simulations on data designed to have intrinsically low complexity confirm our theoretical findings, and display a striking agreement in the empirical performances of compressive NN and dataspace NN. This suggests that high dimensional data sets that have a low complexity underlying structure are well suited for computationally cheap  compressive NN learning.   }
}

Endnote

%0 Conference Paper
%T A New Look at Nearest Neighbours: Identifying Benign Input Geometries via Random Projections
%A Ata Kaban
%B Asian Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2016
%E Geoffrey Holmes
%E Tie-Yan Liu	
%F pmlr-v45-Kaban15b
%I PMLR
%P 65--80
%U https://proceedings.mlr.press/v45/Kaban15b.html
%V 45
%X It is well known that in general, the nearest neighbour rule (NN) has sample complexity that is exponential in the input space dimension d when only smoothness is assumed on the label posterior function. Here we consider NN on randomly projected  data, and we show that, if the input domain has a small "metric size", then the sample complexity becomes exponential in the metric entropy integral of the set of normalised chords of the input domain. This metric entropy integral measures the complexity of the input domain, and can be much smaller than d – for instance in cases when the data lies in a  linear or a smooth nonlinear subspace of the ambient space, or when it has a sparse representation. We then show that the guarantees we obtain for the compressive NN also hold for the dataspace NN in bounded domains; thus the random projection takes the role of an analytic tool to identify benign structures under which NN learning is possible from a small sample size. Numerical simulations on data designed to have intrinsically low complexity confirm our theoretical findings, and display a striking agreement in the empirical performances of compressive NN and dataspace NN. This suggests that high dimensional data sets that have a low complexity underlying structure are well suited for computationally cheap  compressive NN learning.

RIS


TY  - CPAPER
TI  - A New Look at Nearest Neighbours: Identifying Benign Input Geometries via Random Projections
AU  - Ata Kaban
BT  - Asian Conference on Machine Learning
DA  - 2016/02/25
ED  - Geoffrey Holmes
ED  - Tie-Yan Liu	
ID  - pmlr-v45-Kaban15b
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 45
SP  - 65
EP  - 80
L1  - http://proceedings.mlr.press/v45/Kaban15b.pdf
UR  - https://proceedings.mlr.press/v45/Kaban15b.html
AB  - It is well known that in general, the nearest neighbour rule (NN) has sample complexity that is exponential in the input space dimension d when only smoothness is assumed on the label posterior function. Here we consider NN on randomly projected  data, and we show that, if the input domain has a small "metric size", then the sample complexity becomes exponential in the metric entropy integral of the set of normalised chords of the input domain. This metric entropy integral measures the complexity of the input domain, and can be much smaller than d – for instance in cases when the data lies in a  linear or a smooth nonlinear subspace of the ambient space, or when it has a sparse representation. We then show that the guarantees we obtain for the compressive NN also hold for the dataspace NN in bounded domains; thus the random projection takes the role of an analytic tool to identify benign structures under which NN learning is possible from a small sample size. Numerical simulations on data designed to have intrinsically low complexity confirm our theoretical findings, and display a striking agreement in the empirical performances of compressive NN and dataspace NN. This suggests that high dimensional data sets that have a low complexity underlying structure are well suited for computationally cheap  compressive NN learning.   
ER  -

APA


Kaban, A.. (2016). A New Look at Nearest Neighbours: Identifying Benign Input Geometries via Random Projections. Asian Conference on Machine Learning, in Proceedings of Machine Learning Research 45:65-80 Available from https://proceedings.mlr.press/v45/Kaban15b.html.

Related Material

Download PDF