Information Theoretic Model Validation for Spectral Clustering
Proceedings of the Fifteenth International Conference on Artificial Intelligence and Statistics, PMLR 22:495-503, 2012.
Model validation constitutes a fundamental step in data clustering. The central question is: Which cluster model and how many clusters are most appropriate for a certain application? In this study, we introduce a method for the validation of spectral clustering based upon approximation set coding. In particular, we compare correlation and pairwise clustering to analyze the correlations of temporal gene expression profiles. To evaluate and select clustering models, we calculate their reliable informativeness. Experimental results in the context of gene expression analysis show that pairwise clustering yields superior amounts of reliable information. The analysis results are consistent with the Bayesian Information Criterion (BIC), and exhibit higher generality than BIC.