[edit]
Dimensionality Reduction for the Sum-of-Distances Metric
Proceedings of the 38th International Conference on Machine Learning, PMLR 139:3220-3229, 2021.
Abstract
We give a dimensionality reduction procedure to approximate the sum of distances of a given set of n points in Rd to any “shape” that lies in a k-dimensional subspace. Here, by “shape” we mean any set of points in Rd. Our algorithm takes an input in the form of an n×d matrix A, where each row of A denotes a data point, and outputs a subspace P of dimension O(k3/ϵ6) such that the projections of each of the n points onto the subspace P and the distances of each of the points to the subspace P are sufficient to obtain an ϵ-approximation to the sum of distances to any arbitrary shape that lies in a k-dimensional subspace of Rd. These include important problems such as k-median, k-subspace approximation, and (j,l) subspace clustering with j⋅l≤k. Dimensionality reduction reduces the data storage requirement to (n+d)k3/ϵ6 from nnz(A). Here nnz(A) could potentially be as large as nd. Our algorithm runs in time nnz(A)/ϵ2+(n+d)poly(k/ϵ), up to logarithmic factors. For dense matrices, where nnz(A)≈nd, we give a faster algorithm, that runs in time nd+(n+d)poly(k/ϵ) up to logarithmic factors. Our dimensionality reduction algorithm can also be used to obtain poly(k/ϵ) size coresets for k-median and (k,1)-subspace approximation problems in polynomial time.