[edit]
Clustering: Science or Art?
Proceedings of ICML Workshop on Unsupervised and Transfer Learning, PMLR 27:65-79, 2012.
Abstract
We examine whether the quality of different clustering algorithms
can be compared by a general, scientifically sound procedure which
is independent of particular clustering algorithms. We argue that
the major obstacle is the difficulty in evaluating a clustering
algorithm without taking into account the context: why does the user
cluster his data in the first place, and what does he want to do
with the clustering afterwards? We argue that clustering should not
be treated as an application-independent mathematical problem, but
should always be studied in the context of its end-use. Different
techniques to evaluate clustering algorithms have to be developed
for different uses of clustering. To simplify this procedure we
argue that it will be useful to build a “taxonomy of clustering
problems” to identify clustering applications which can be treated
in a unified way and that such an effort will be more fruitful than
attempting the impossible – developing “optimal” domain-independent
clustering algorithms or even classifying clustering algorithms in
terms of how they work.