[edit]
Coresets for Ordered Weighted Clustering
Proceedings of the 36th International Conference on Machine Learning, PMLR 97:744-753, 2019.
Abstract
We design coresets for Ordered k-Median, a generalization of classical clustering problems such as k-Median and k-Center. Its objective function is defined via the Ordered Weighted Averaging (OWA) paradigm of Yager (1988), where data points are weighted according to a predefined weight vector, but in order of their contribution to the objective (distance from the centers). A powerful data-reduction technique, called a coreset, is to summarize a point set X in Rd into a small (weighted) point set X′, such that for every set of k potential centers, the objective value of the coreset X′ approximates that of X within factor 1±ϵ. When there are multiple objectives (weights), the above standard coreset might have limited usefulness, whereas in a simultaneous coreset, the above approximation holds for all weights (in addition to all centers). Our main result is a construction of a simultaneous coreset of size Oϵ,d(k2log2|X|) for Ordered k-Median. We validate our algorithm on a real geographical data set, and we find our coreset leads to a massive speedup of clustering computations, while maintaining high accuracy for a range of weights.