[edit]
DiffRed: Dimensionality reduction guided by stable rank
Proceedings of The 27th International Conference on Artificial Intelligence and Statistics, PMLR 238:3430-3438, 2024.
Abstract
In this work, we propose a novel dimensionality reduction technique, \textit{DiffRed}, which first projects the data matrix, A, along first k1 principal components and the residual matrix A∗ (left after subtracting its k1-rank approximation) along k2 Gaussian random vectors. We evaluate \emph{M1}, the distortion of mean-squared pair-wise distance, and \emph{Stress}, the normalized value of RMS of distortion of the pairwise distances. We rigorously prove that \textit{DiffRed} achieves a general upper bound of O(√1−pk2) on \emph{Stress} and O(1−p√k2∗ρ(A∗)) on \emph{M1} where p is the fraction of variance explained by the first k1 principal components and ρ(A∗) is the \textit{stable rank} of A∗. These bounds are tighter than the currently known results for Random maps. Our extensive experiments on a variety of real-world datasets demonstrate that \textit{DiffRed} achieves near zero \emph{M1} and much lower values of \emph{Stress} as compared to the well-known dimensionality reduction techniques. In particular, \textit{DiffRed} can map a 6 million dimensional dataset to 10 dimensions with 54% lower \emph{Stress} than PCA.