Nearly Optimal Classification for Semimetrics

Lee-Ad Gottlieb; Aryeh Kontorovich; Pinhas Nisnevitch

Nearly Optimal Classification for Semimetrics

Lee-Ad Gottlieb, Aryeh Kontorovich, Pinhas Nisnevitch

Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, PMLR 51:379-388, 2016.

Abstract

We initiate the rigorous study of classification in semimetric spaces, which are point sets with a distance function that is non-negative and symmetric, but need not satisfy the triangle inequality. We define the \em density dimension \dens and discover that it plays a central role in the statistical and algorithmic feasibility of learning in semimetric spaces. We compute this quantity for several widely used semimetrics and present nearly optimal sample compression algorithms, which are then used to obtain generalization guarantees, including fast rates. Our claim of near-optimality holds in both computational and statistical senses. When the sample has radius R and margin γ, we show that it can be compressed down to roughly d=(R/γ)^\dens points, and further that finding a significantly better compression is algorithmically intractable unless P=NP. This compression implies generalization via standard Occam-type arguments, to which we provide a nearly matching lower bound.

Cite this Paper

BibTeX


@InProceedings{pmlr-v51-gottlieb16,
  title = 	 {Nearly Optimal Classification for Semimetrics},
  author = 	 {Gottlieb, Lee-Ad and Kontorovich, Aryeh and Nisnevitch, Pinhas},
  booktitle = 	 {Proceedings of the 19th International Conference on Artificial Intelligence and Statistics},
  pages = 	 {379--388},
  year = 	 {2016},
  editor = 	 {Gretton, Arthur and Robert, Christian C.},
  volume = 	 {51},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {Cadiz, Spain},
  month = 	 {09--11 May},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v51/gottlieb16.pdf},
  url = 	 {https://proceedings.mlr.press/v51/gottlieb16.html},
  abstract = 	 {We initiate the rigorous study of classification in semimetric spaces, which are point sets with a distance function that is non-negative and symmetric, but need not satisfy the triangle inequality. We define the \em density dimension \dens and discover that it plays a central role in the statistical and algorithmic feasibility of learning in semimetric spaces. We compute this quantity for several widely used semimetrics and present nearly optimal sample compression algorithms, which are then used to obtain generalization guarantees, including fast rates.  Our claim of near-optimality holds in both computational and statistical senses. When the sample has radius R and margin γ, we show that it can be compressed down to roughly d=(R/γ)^\dens points, and further that finding a significantly better compression is algorithmically intractable unless P=NP. This compression implies generalization via standard Occam-type arguments, to which we provide a nearly matching lower bound.}
}

Endnote

%0 Conference Paper
%T Nearly Optimal Classification for Semimetrics
%A Lee-Ad Gottlieb
%A Aryeh Kontorovich
%A Pinhas Nisnevitch
%B Proceedings of the 19th International Conference on Artificial Intelligence and Statistics
%C Proceedings of Machine Learning Research
%D 2016
%E Arthur Gretton
%E Christian C. Robert	
%F pmlr-v51-gottlieb16
%I PMLR
%P 379--388
%U https://proceedings.mlr.press/v51/gottlieb16.html
%V 51
%X We initiate the rigorous study of classification in semimetric spaces, which are point sets with a distance function that is non-negative and symmetric, but need not satisfy the triangle inequality. We define the \em density dimension \dens and discover that it plays a central role in the statistical and algorithmic feasibility of learning in semimetric spaces. We compute this quantity for several widely used semimetrics and present nearly optimal sample compression algorithms, which are then used to obtain generalization guarantees, including fast rates.  Our claim of near-optimality holds in both computational and statistical senses. When the sample has radius R and margin γ, we show that it can be compressed down to roughly d=(R/γ)^\dens points, and further that finding a significantly better compression is algorithmically intractable unless P=NP. This compression implies generalization via standard Occam-type arguments, to which we provide a nearly matching lower bound.

RIS


TY  - CPAPER
TI  - Nearly Optimal Classification for Semimetrics
AU  - Lee-Ad Gottlieb
AU  - Aryeh Kontorovich
AU  - Pinhas Nisnevitch
BT  - Proceedings of the 19th International Conference on Artificial Intelligence and Statistics
DA  - 2016/05/02
ED  - Arthur Gretton
ED  - Christian C. Robert	
ID  - pmlr-v51-gottlieb16
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 51
SP  - 379
EP  - 388
L1  - http://proceedings.mlr.press/v51/gottlieb16.pdf
UR  - https://proceedings.mlr.press/v51/gottlieb16.html
AB  - We initiate the rigorous study of classification in semimetric spaces, which are point sets with a distance function that is non-negative and symmetric, but need not satisfy the triangle inequality. We define the \em density dimension \dens and discover that it plays a central role in the statistical and algorithmic feasibility of learning in semimetric spaces. We compute this quantity for several widely used semimetrics and present nearly optimal sample compression algorithms, which are then used to obtain generalization guarantees, including fast rates.  Our claim of near-optimality holds in both computational and statistical senses. When the sample has radius R and margin γ, we show that it can be compressed down to roughly d=(R/γ)^\dens points, and further that finding a significantly better compression is algorithmically intractable unless P=NP. This compression implies generalization via standard Occam-type arguments, to which we provide a nearly matching lower bound.
ER  -

APA


Gottlieb, L., Kontorovich, A. & Nisnevitch, P.. (2016). Nearly Optimal Classification for Semimetrics. Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 51:379-388 Available from https://proceedings.mlr.press/v51/gottlieb16.html.

Related Material

Download PDF