[edit]
Consistent k-Clustering
Proceedings of the 34th International Conference on Machine Learning, PMLR 70:1975-1984, 2017.
Abstract
The study of online algorithms and competitive analysis provides a solid foundation for studying the quality of irrevocable decision making when the data arrives in an online manner. While in some scenarios the decisions are indeed irrevocable, there are many practical situations when changing a previous decision is not impossible, but simply expensive. In this work we formalize this notion and introduce the consistent k-clustering problem. With points arriving online, the goal is to maintain a constant approximate solution, while minimizing the number of reclusterings necessary. We prove a lower bound, showing that $\Omega(k \log n)$ changes are necessary in the worst case for a wide range of objective functions. On the positive side, we give an algorithm that needs only $O(k^2 \log^4n)$ changes to maintain a constant competitive solution, an exponential improvement on the naive solution of reclustering at every time step. Finally, we show experimentally that our approach performs much better than the theoretical bound, with the number of changes growing approximately as $O(\log n)$.