Competitive Classification and Closeness Testing

Jayadev Acharya; Hirakendu Das; Ashkan Jafarpour; Alon Orlitsky; Shengjun Pan; Ananda Suresh

Competitive Classification and Closeness Testing

Jayadev Acharya, Hirakendu Das, Ashkan Jafarpour, Alon Orlitsky, Shengjun Pan, Ananda Suresh

Proceedings of the 25th Annual Conference on Learning Theory, PMLR 23:22.1-22.18, 2012.

Abstract

We study the problems of \emphclassification and \emphcloseness testing. A \emphclassifier associates a test sequence with the one of two training sequences that was generated by the same distribution. A \emphcloseness test determines whether two sequences were generated by the same or by different distributions. For both problems all natural algorithms are \emphsymmetric – they make the same decision under all symbol relabelings. With no assumptions on the distributions’ support size or relative distance, we construct a classifier and closeness test that require at most O(n^3/2) samples to attain the n-sample accuracy of the best symmetric classifier or closeness test designed with knowledge of the underlying distributions. Both algorithms run in time linear in the number of samples. Conversely we also show that for any classifier or closeness test, there are distributions that require Ω(n^7/6) samples to achieve the n-sample accuracy of the best symmetric algorithm that knows the underlying distributions.

Cite this Paper

BibTeX


@InProceedings{pmlr-v23-acharya12,
  title = 	 {Competitive Classification and Closeness Testing},
  author = 	 {Acharya, Jayadev and Das, Hirakendu and Jafarpour, Ashkan and Orlitsky, Alon and Pan, Shengjun and Suresh, Ananda},
  booktitle = 	 {Proceedings of the 25th Annual Conference on Learning Theory},
  pages = 	 {22.1--22.18},
  year = 	 {2012},
  editor = 	 {Mannor, Shie and Srebro, Nathan and Williamson, Robert C.},
  volume = 	 {23},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {Edinburgh, Scotland},
  month = 	 {25--27 Jun},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v23/acharya12/acharya12.pdf},
  url = 	 {https://proceedings.mlr.press/v23/acharya12.html},
  abstract = 	 {We study the problems of \emphclassification and \emphcloseness testing. A \emphclassifier associates a test sequence with the one of two training sequences that was generated by the same distribution. A \emphcloseness test determines whether two sequences were generated by the same or by different distributions. For both problems all natural algorithms are \emphsymmetric – they make the same decision under all symbol relabelings. With no assumptions on the distributions’ support size or relative distance, we construct a classifier and closeness test that require at most O(n^3/2) samples to attain the n-sample accuracy of the best symmetric classifier or closeness test designed with knowledge of the underlying distributions. Both algorithms run in time linear in the number of samples. Conversely we also show that for any classifier or closeness test, there are distributions that require Ω(n^7/6) samples to achieve the n-sample accuracy of the best symmetric algorithm that knows the underlying distributions.}
}

Endnote

%0 Conference Paper
%T Competitive Classification and Closeness Testing
%A Jayadev Acharya
%A Hirakendu Das
%A Ashkan Jafarpour
%A Alon Orlitsky
%A Shengjun Pan
%A Ananda Suresh
%B Proceedings of the 25th Annual Conference on Learning Theory
%C Proceedings of Machine Learning Research
%D 2012
%E Shie Mannor
%E Nathan Srebro
%E Robert C. Williamson	
%F pmlr-v23-acharya12
%I PMLR
%P 22.1--22.18
%U https://proceedings.mlr.press/v23/acharya12.html
%V 23
%X We study the problems of \emphclassification and \emphcloseness testing. A \emphclassifier associates a test sequence with the one of two training sequences that was generated by the same distribution. A \emphcloseness test determines whether two sequences were generated by the same or by different distributions. For both problems all natural algorithms are \emphsymmetric – they make the same decision under all symbol relabelings. With no assumptions on the distributions’ support size or relative distance, we construct a classifier and closeness test that require at most O(n^3/2) samples to attain the n-sample accuracy of the best symmetric classifier or closeness test designed with knowledge of the underlying distributions. Both algorithms run in time linear in the number of samples. Conversely we also show that for any classifier or closeness test, there are distributions that require Ω(n^7/6) samples to achieve the n-sample accuracy of the best symmetric algorithm that knows the underlying distributions.

RIS


TY  - CPAPER
TI  - Competitive Classification and Closeness Testing
AU  - Jayadev Acharya
AU  - Hirakendu Das
AU  - Ashkan Jafarpour
AU  - Alon Orlitsky
AU  - Shengjun Pan
AU  - Ananda Suresh
BT  - Proceedings of the 25th Annual Conference on Learning Theory
DA  - 2012/06/16
ED  - Shie Mannor
ED  - Nathan Srebro
ED  - Robert C. Williamson	
ID  - pmlr-v23-acharya12
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 23
SP  - 22.1
EP  - 22.18
L1  - http://proceedings.mlr.press/v23/acharya12/acharya12.pdf
UR  - https://proceedings.mlr.press/v23/acharya12.html
AB  - We study the problems of \emphclassification and \emphcloseness testing. A \emphclassifier associates a test sequence with the one of two training sequences that was generated by the same distribution. A \emphcloseness test determines whether two sequences were generated by the same or by different distributions. For both problems all natural algorithms are \emphsymmetric – they make the same decision under all symbol relabelings. With no assumptions on the distributions’ support size or relative distance, we construct a classifier and closeness test that require at most O(n^3/2) samples to attain the n-sample accuracy of the best symmetric classifier or closeness test designed with knowledge of the underlying distributions. Both algorithms run in time linear in the number of samples. Conversely we also show that for any classifier or closeness test, there are distributions that require Ω(n^7/6) samples to achieve the n-sample accuracy of the best symmetric algorithm that knows the underlying distributions.
ER  -

APA


Acharya, J., Das, H., Jafarpour, A., Orlitsky, A., Pan, S. & Suresh, A.. (2012). Competitive Classification and Closeness Testing. Proceedings of the 25th Annual Conference on Learning Theory, in Proceedings of Machine Learning Research 23:22.1-22.18 Available from https://proceedings.mlr.press/v23/acharya12.html.

Related Material

Download PDF