On Some Fast And Robust Classifiers For High Dimension, Low Sample Size Data

Sarbojit Roy, Jyotishka Ray Choudhury, Subhajit Dutta
Proceedings of The 25th International Conference on Artificial Intelligence and Statistics, PMLR 151:9943-9968, 2022.

Abstract

In high dimension, low sample size (HDLSS) settings, distance concentration phenomena affects the performance of several popular classifiers which are based on Euclidean distances. The behaviour of these classifiers in high dimensions is completely governed by the first and second order moments of the underlying class distributions. Moreover, the classifiers become useless for such HDLSS data when the first two moments of the competing distributions are equal, or when the moments do not exist. In this work, we propose robust, computationally efficient and tuning-free classifiers applicable in the HDLSS scenario. As the data dimension increases, these classifiers yield perfect classification if the one-dimensional marginals of the underlying distributions are different. We establish strong theoretical properties for the proposed classifiers in ultrahigh-dimensional settings. Numerical experiments with a wide variety of simulated examples and analysis of real data sets exhibit clear and convincing advantages over existing methods.

Cite this Paper


BibTeX
@InProceedings{pmlr-v151-roy22a, title = { On Some Fast And Robust Classifiers For High Dimension, Low Sample Size Data }, author = {Roy, Sarbojit and Ray Choudhury, Jyotishka and Dutta, Subhajit}, booktitle = {Proceedings of The 25th International Conference on Artificial Intelligence and Statistics}, pages = {9943--9968}, year = {2022}, editor = {Camps-Valls, Gustau and Ruiz, Francisco J. R. and Valera, Isabel}, volume = {151}, series = {Proceedings of Machine Learning Research}, month = {28--30 Mar}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v151/roy22a/roy22a.pdf}, url = {https://proceedings.mlr.press/v151/roy22a.html}, abstract = { In high dimension, low sample size (HDLSS) settings, distance concentration phenomena affects the performance of several popular classifiers which are based on Euclidean distances. The behaviour of these classifiers in high dimensions is completely governed by the first and second order moments of the underlying class distributions. Moreover, the classifiers become useless for such HDLSS data when the first two moments of the competing distributions are equal, or when the moments do not exist. In this work, we propose robust, computationally efficient and tuning-free classifiers applicable in the HDLSS scenario. As the data dimension increases, these classifiers yield perfect classification if the one-dimensional marginals of the underlying distributions are different. We establish strong theoretical properties for the proposed classifiers in ultrahigh-dimensional settings. Numerical experiments with a wide variety of simulated examples and analysis of real data sets exhibit clear and convincing advantages over existing methods. } }
Endnote
%0 Conference Paper %T On Some Fast And Robust Classifiers For High Dimension, Low Sample Size Data %A Sarbojit Roy %A Jyotishka Ray Choudhury %A Subhajit Dutta %B Proceedings of The 25th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2022 %E Gustau Camps-Valls %E Francisco J. R. Ruiz %E Isabel Valera %F pmlr-v151-roy22a %I PMLR %P 9943--9968 %U https://proceedings.mlr.press/v151/roy22a.html %V 151 %X In high dimension, low sample size (HDLSS) settings, distance concentration phenomena affects the performance of several popular classifiers which are based on Euclidean distances. The behaviour of these classifiers in high dimensions is completely governed by the first and second order moments of the underlying class distributions. Moreover, the classifiers become useless for such HDLSS data when the first two moments of the competing distributions are equal, or when the moments do not exist. In this work, we propose robust, computationally efficient and tuning-free classifiers applicable in the HDLSS scenario. As the data dimension increases, these classifiers yield perfect classification if the one-dimensional marginals of the underlying distributions are different. We establish strong theoretical properties for the proposed classifiers in ultrahigh-dimensional settings. Numerical experiments with a wide variety of simulated examples and analysis of real data sets exhibit clear and convincing advantages over existing methods.
APA
Roy, S., Ray Choudhury, J. & Dutta, S.. (2022). On Some Fast And Robust Classifiers For High Dimension, Low Sample Size Data . Proceedings of The 25th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 151:9943-9968 Available from https://proceedings.mlr.press/v151/roy22a.html.

Related Material