DiffRed: Dimensionality reduction guided by stable rank

Prarabdh Shukla; Gagan Raj Gupta; Kunal Dutta

DiffRed: Dimensionality reduction guided by stable rank

Prarabdh Shukla, Gagan Raj Gupta, Kunal Dutta

Proceedings of The 27th International Conference on Artificial Intelligence and Statistics, PMLR 238:3430-3438, 2024.

Abstract

In this work, we propose a novel dimensionality reduction technique, \textit{DiffRed}, which first projects the data matrix, A, along first $k_1$ principal components and the residual matrix $A^{*}$ (left after subtracting its $k_1$-rank approximation) along $k_2$ Gaussian random vectors. We evaluate \emph{M1}, the distortion of mean-squared pair-wise distance, and \emph{Stress}, the normalized value of RMS of distortion of the pairwise distances. We rigorously prove that \textit{DiffRed} achieves a general upper bound of $O\left(\sqrt{\frac{1-p}{k_2}}\right)$ on \emph{Stress} and $O\left(\frac{1-p}{\sqrt{k_2*\rho(A^{*})}}\right)$ on \emph{M1} where $p$ is the fraction of variance explained by the first $k_1$ principal components and $\rho(A^{*})$ is the \textit{stable rank} of $A^{*}$. These bounds are tighter than the currently known results for Random maps. Our extensive experiments on a variety of real-world datasets demonstrate that \textit{DiffRed} achieves near zero \emph{M1} and much lower values of \emph{Stress} as compared to the well-known dimensionality reduction techniques. In particular, \textit{DiffRed} can map a 6 million dimensional dataset to 10 dimensions with 54% lower \emph{Stress} than PCA.

Cite this Paper

BibTeX

@InProceedings{pmlr-v238-shukla24a,
  title = 	 {{D}iff{R}ed: Dimensionality reduction guided by stable rank},
  author =       {Shukla, Prarabdh and Raj Gupta, Gagan and Dutta, Kunal},
  booktitle = 	 {Proceedings of The 27th International Conference on Artificial Intelligence and Statistics},
  pages = 	 {3430--3438},
  year = 	 {2024},
  editor = 	 {Dasgupta, Sanjoy and Mandt, Stephan and Li, Yingzhen},
  volume = 	 {238},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {02--04 May},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v238/shukla24a/shukla24a.pdf},
  url = 	 {https://proceedings.mlr.press/v238/shukla24a.html},
  abstract = 	 {In this work, we propose a novel dimensionality reduction technique, \textit{DiffRed}, which first projects the data matrix, A, along first $k_1$ principal components and the residual matrix $A^{*}$ (left after subtracting its $k_1$-rank approximation) along $k_2$ Gaussian random vectors. We evaluate \emph{M1}, the distortion of mean-squared pair-wise distance, and \emph{Stress}, the normalized value of RMS of distortion of the pairwise distances. We rigorously prove that \textit{DiffRed} achieves a general upper bound of $O\left(\sqrt{\frac{1-p}{k_2}}\right)$ on \emph{Stress} and $O\left(\frac{1-p}{\sqrt{k_2*\rho(A^{*})}}\right)$ on \emph{M1} where $p$ is the fraction of variance explained by the first $k_1$ principal components and $\rho(A^{*})$ is the \textit{stable rank} of $A^{*}$. These bounds are tighter than the currently known results for Random maps. Our extensive experiments on a variety of real-world datasets demonstrate that \textit{DiffRed} achieves near zero \emph{M1} and much lower values of \emph{Stress} as compared to the well-known dimensionality reduction techniques. In particular, \textit{DiffRed} can map a 6 million dimensional dataset to 10 dimensions with 54% lower \emph{Stress} than PCA.}
}

Endnote

%0 Conference Paper
%T DiffRed: Dimensionality reduction guided by stable rank
%A Prarabdh Shukla
%A Gagan Raj Gupta
%A Kunal Dutta
%B Proceedings of The 27th International Conference on Artificial Intelligence and Statistics
%C Proceedings of Machine Learning Research
%D 2024
%E Sanjoy Dasgupta
%E Stephan Mandt
%E Yingzhen Li	
%F pmlr-v238-shukla24a
%I PMLR
%P 3430--3438
%U https://proceedings.mlr.press/v238/shukla24a.html
%V 238
%X In this work, we propose a novel dimensionality reduction technique, \textit{DiffRed}, which first projects the data matrix, A, along first $k_1$ principal components and the residual matrix $A^{*}$ (left after subtracting its $k_1$-rank approximation) along $k_2$ Gaussian random vectors. We evaluate \emph{M1}, the distortion of mean-squared pair-wise distance, and \emph{Stress}, the normalized value of RMS of distortion of the pairwise distances. We rigorously prove that \textit{DiffRed} achieves a general upper bound of $O\left(\sqrt{\frac{1-p}{k_2}}\right)$ on \emph{Stress} and $O\left(\frac{1-p}{\sqrt{k_2*\rho(A^{*})}}\right)$ on \emph{M1} where $p$ is the fraction of variance explained by the first $k_1$ principal components and $\rho(A^{*})$ is the \textit{stable rank} of $A^{*}$. These bounds are tighter than the currently known results for Random maps. Our extensive experiments on a variety of real-world datasets demonstrate that \textit{DiffRed} achieves near zero \emph{M1} and much lower values of \emph{Stress} as compared to the well-known dimensionality reduction techniques. In particular, \textit{DiffRed} can map a 6 million dimensional dataset to 10 dimensions with 54% lower \emph{Stress} than PCA.

APA

Shukla, P., Raj Gupta, G. & Dutta, K.. (2024). DiffRed: Dimensionality reduction guided by stable rank. Proceedings of The 27th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 238:3430-3438 Available from https://proceedings.mlr.press/v238/shukla24a.html.

DiffRed: Dimensionality reduction guided by stable rank

Abstract

Cite this Paper

Related Material