On Euclidean k-Means Clustering with alpha-Center Proximity

Amit Deshpande; Anand Louis; Apoorv Singh

On Euclidean k-Means Clustering with alpha-Center Proximity

Amit Deshpande, Anand Louis, Apoorv Singh

Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics, PMLR 89:2087-2095, 2019.

Abstract

$k$-means clustering is NP-hard in the worst case but previous work has shown efficient algorithms assuming the optimal $k$-means clusters are \emph{stable} under additive or multiplicative perturbation of data. This has two caveats. First, we do not know how to efficiently verify this property of optimal solutions that are NP-hard to compute in the first place. Second, the stability assumptions required for polynomial time $k$-means algorithms are often unreasonable when compared to the ground-truth clusters in real-world data. A consequence of multiplicative perturbation resilience is \emph{center proximity}, that is, every point is closer to the center of its own cluster than the center of any other cluster, by some multiplicative factor $\alpha > 1$. We study the problem of minimizing the Euclidean $k$-means objective only over clusterings that satisfy $\alpha$-center proximity. We give a simple algorithm to find the optimal $\alpha$-center-proximal $k$-means clustering in running time exponential in $k$ and $1/(\alpha - 1)$ but linear in the number of points and the dimension. We define an analogous $\alpha$-center proximity condition for outliers, and give similar algorithmic guarantees for $k$-means with outliers and $\alpha$-center proximity. On the hardness side we show that for any $\alpha’ > 1$, there exists an $\alpha \leq \alpha’$, $(\alpha >1)$, and an $\e_0 > 0$ such that minimizing the $k$-means objective over clusterings that satisfy $\alpha$-center proximity is NP-hard to approximate within a multiplicative $(1+\e_0)$ factor.

Cite this Paper

BibTeX

@InProceedings{pmlr-v89-deshpande19a,
  title = 	 {On Euclidean k-Means Clustering with alpha-Center Proximity},
  author =       {Deshpande, Amit and Louis, Anand and Singh, Apoorv},
  booktitle = 	 {Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics},
  pages = 	 {2087--2095},
  year = 	 {2019},
  editor = 	 {Chaudhuri, Kamalika and Sugiyama, Masashi},
  volume = 	 {89},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {16--18 Apr},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v89/deshpande19a/deshpande19a.pdf},
  url = 	 {https://proceedings.mlr.press/v89/deshpande19a.html},
  abstract = 	 {$k$-means clustering is NP-hard in the worst case but previous work has shown efficient algorithms assuming the optimal $k$-means clusters are \emph{stable} under additive or multiplicative perturbation of data. This has two caveats. First, we do not know how to efficiently verify this property of optimal solutions that are NP-hard to compute in the first place. Second, the stability assumptions required for polynomial time $k$-means algorithms are often unreasonable when compared to the ground-truth clusters in real-world data. A consequence of multiplicative perturbation resilience is \emph{center proximity}, that is, every point is closer to the center of its own cluster than the center of any other cluster, by some multiplicative factor $\alpha > 1$.  We study the problem of minimizing the Euclidean $k$-means objective only over clusterings that satisfy $\alpha$-center proximity. We give a simple algorithm to find the optimal $\alpha$-center-proximal $k$-means clustering in running time exponential in $k$ and $1/(\alpha - 1)$ but linear in the number of points and the dimension. We define an analogous $\alpha$-center proximity condition for outliers, and give similar algorithmic guarantees for $k$-means with outliers and $\alpha$-center proximity. On the hardness side we show that for any $\alpha’ > 1$, there exists an $\alpha \leq \alpha’$, $(\alpha >1)$, and an $\e_0 > 0$ such that minimizing the $k$-means objective over clusterings that satisfy $\alpha$-center proximity is NP-hard to approximate within a multiplicative $(1+\e_0)$ factor.}
}

Endnote

%0 Conference Paper
%T On Euclidean k-Means Clustering with alpha-Center Proximity
%A Amit Deshpande
%A Anand Louis
%A Apoorv Singh
%B Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics
%C Proceedings of Machine Learning Research
%D 2019
%E Kamalika Chaudhuri
%E Masashi Sugiyama	
%F pmlr-v89-deshpande19a
%I PMLR
%P 2087--2095
%U https://proceedings.mlr.press/v89/deshpande19a.html
%V 89
%X $k$-means clustering is NP-hard in the worst case but previous work has shown efficient algorithms assuming the optimal $k$-means clusters are \emph{stable} under additive or multiplicative perturbation of data. This has two caveats. First, we do not know how to efficiently verify this property of optimal solutions that are NP-hard to compute in the first place. Second, the stability assumptions required for polynomial time $k$-means algorithms are often unreasonable when compared to the ground-truth clusters in real-world data. A consequence of multiplicative perturbation resilience is \emph{center proximity}, that is, every point is closer to the center of its own cluster than the center of any other cluster, by some multiplicative factor $\alpha > 1$.  We study the problem of minimizing the Euclidean $k$-means objective only over clusterings that satisfy $\alpha$-center proximity. We give a simple algorithm to find the optimal $\alpha$-center-proximal $k$-means clustering in running time exponential in $k$ and $1/(\alpha - 1)$ but linear in the number of points and the dimension. We define an analogous $\alpha$-center proximity condition for outliers, and give similar algorithmic guarantees for $k$-means with outliers and $\alpha$-center proximity. On the hardness side we show that for any $\alpha’ > 1$, there exists an $\alpha \leq \alpha’$, $(\alpha >1)$, and an $\e_0 > 0$ such that minimizing the $k$-means objective over clusterings that satisfy $\alpha$-center proximity is NP-hard to approximate within a multiplicative $(1+\e_0)$ factor.

APA

Deshpande, A., Louis, A. & Singh, A.. (2019). On Euclidean k-Means Clustering with alpha-Center Proximity. Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 89:2087-2095 Available from https://proceedings.mlr.press/v89/deshpande19a.html.

On Euclidean k-Means Clustering with alpha-Center Proximity

Abstract

Cite this Paper

Related Material