Nearly Optimal Classification for Semimetrics

Lee-Ad Gottlieb, Aryeh Kontorovich, Pinhas Nisnevitch
Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, PMLR 51:379-388, 2016.

Abstract

We initiate the rigorous study of classification in semimetric spaces, which are point sets with a distance function that is non-negative and symmetric, but need not satisfy the triangle inequality. We define the \em density dimension \dens and discover that it plays a central role in the statistical and algorithmic feasibility of learning in semimetric spaces. We compute this quantity for several widely used semimetrics and present nearly optimal sample compression algorithms, which are then used to obtain generalization guarantees, including fast rates. Our claim of near-optimality holds in both computational and statistical senses. When the sample has radius R and margin γ, we show that it can be compressed down to roughly d=(R/γ)^\dens points, and further that finding a significantly better compression is algorithmically intractable unless P=NP. This compression implies generalization via standard Occam-type arguments, to which we provide a nearly matching lower bound.

Cite this Paper


BibTeX
@InProceedings{pmlr-v51-gottlieb16, title = {Nearly Optimal Classification for Semimetrics}, author = {Gottlieb, Lee-Ad and Kontorovich, Aryeh and Nisnevitch, Pinhas}, booktitle = {Proceedings of the 19th International Conference on Artificial Intelligence and Statistics}, pages = {379--388}, year = {2016}, editor = {Gretton, Arthur and Robert, Christian C.}, volume = {51}, series = {Proceedings of Machine Learning Research}, address = {Cadiz, Spain}, month = {09--11 May}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v51/gottlieb16.pdf}, url = {https://proceedings.mlr.press/v51/gottlieb16.html}, abstract = {We initiate the rigorous study of classification in semimetric spaces, which are point sets with a distance function that is non-negative and symmetric, but need not satisfy the triangle inequality. We define the \em density dimension \dens and discover that it plays a central role in the statistical and algorithmic feasibility of learning in semimetric spaces. We compute this quantity for several widely used semimetrics and present nearly optimal sample compression algorithms, which are then used to obtain generalization guarantees, including fast rates. Our claim of near-optimality holds in both computational and statistical senses. When the sample has radius R and margin γ, we show that it can be compressed down to roughly d=(R/γ)^\dens points, and further that finding a significantly better compression is algorithmically intractable unless P=NP. This compression implies generalization via standard Occam-type arguments, to which we provide a nearly matching lower bound.} }
Endnote
%0 Conference Paper %T Nearly Optimal Classification for Semimetrics %A Lee-Ad Gottlieb %A Aryeh Kontorovich %A Pinhas Nisnevitch %B Proceedings of the 19th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2016 %E Arthur Gretton %E Christian C. Robert %F pmlr-v51-gottlieb16 %I PMLR %P 379--388 %U https://proceedings.mlr.press/v51/gottlieb16.html %V 51 %X We initiate the rigorous study of classification in semimetric spaces, which are point sets with a distance function that is non-negative and symmetric, but need not satisfy the triangle inequality. We define the \em density dimension \dens and discover that it plays a central role in the statistical and algorithmic feasibility of learning in semimetric spaces. We compute this quantity for several widely used semimetrics and present nearly optimal sample compression algorithms, which are then used to obtain generalization guarantees, including fast rates. Our claim of near-optimality holds in both computational and statistical senses. When the sample has radius R and margin γ, we show that it can be compressed down to roughly d=(R/γ)^\dens points, and further that finding a significantly better compression is algorithmically intractable unless P=NP. This compression implies generalization via standard Occam-type arguments, to which we provide a nearly matching lower bound.
RIS
TY - CPAPER TI - Nearly Optimal Classification for Semimetrics AU - Lee-Ad Gottlieb AU - Aryeh Kontorovich AU - Pinhas Nisnevitch BT - Proceedings of the 19th International Conference on Artificial Intelligence and Statistics DA - 2016/05/02 ED - Arthur Gretton ED - Christian C. Robert ID - pmlr-v51-gottlieb16 PB - PMLR DP - Proceedings of Machine Learning Research VL - 51 SP - 379 EP - 388 L1 - http://proceedings.mlr.press/v51/gottlieb16.pdf UR - https://proceedings.mlr.press/v51/gottlieb16.html AB - We initiate the rigorous study of classification in semimetric spaces, which are point sets with a distance function that is non-negative and symmetric, but need not satisfy the triangle inequality. We define the \em density dimension \dens and discover that it plays a central role in the statistical and algorithmic feasibility of learning in semimetric spaces. We compute this quantity for several widely used semimetrics and present nearly optimal sample compression algorithms, which are then used to obtain generalization guarantees, including fast rates. Our claim of near-optimality holds in both computational and statistical senses. When the sample has radius R and margin γ, we show that it can be compressed down to roughly d=(R/γ)^\dens points, and further that finding a significantly better compression is algorithmically intractable unless P=NP. This compression implies generalization via standard Occam-type arguments, to which we provide a nearly matching lower bound. ER -
APA
Gottlieb, L., Kontorovich, A. & Nisnevitch, P.. (2016). Nearly Optimal Classification for Semimetrics. Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 51:379-388 Available from https://proceedings.mlr.press/v51/gottlieb16.html.

Related Material