Model-based Co-clustering for High Dimensional Sparse Data


Aghiles Salah, Nicoleta Rogovschi, Mohamed Nadif ;
Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, PMLR 51:866-874, 2016.


We propose a novel model based on the von Mises-Fisher (vMF) distribution for co-clustering high dimensional sparse matrices. While existing vMF-based models are only suitable for clustering along one dimension, our model acts simultaneously on both dimensions of a data matrix. Thereby it has the advantage of exploiting the inherent duality between rows and columns. Setting our model under the maximum likelihood (ML) approach and the classification ML (CML) approach, we derive two novel, hard and soft, co-clustering algorithms. Empirical results on numerous synthetic and real-world text datasets, demonstrate the effectiveness of our approach, for modelling high dimensional sparse data and co-clustering. Furthermore, thanks to our formulation, that performs an implicitly adaptive dimensionality reduction at each stage, our model alleviates the problem of high concentration parameters kappa’s, a well known difficulty in the classical vMF-based models.

Related Material