Model-based Co-clustering for High Dimensional Sparse Data

Aghiles Salah, Nicoleta Rogovschi, Mohamed Nadif
Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, PMLR 51:866-874, 2016.

Abstract

We propose a novel model based on the von Mises-Fisher (vMF) distribution for co-clustering high dimensional sparse matrices. While existing vMF-based models are only suitable for clustering along one dimension, our model acts simultaneously on both dimensions of a data matrix. Thereby it has the advantage of exploiting the inherent duality between rows and columns. Setting our model under the maximum likelihood (ML) approach and the classification ML (CML) approach, we derive two novel, hard and soft, co-clustering algorithms. Empirical results on numerous synthetic and real-world text datasets, demonstrate the effectiveness of our approach, for modelling high dimensional sparse data and co-clustering. Furthermore, thanks to our formulation, that performs an implicitly adaptive dimensionality reduction at each stage, our model alleviates the problem of high concentration parameters kappa’s, a well known difficulty in the classical vMF-based models.

Cite this Paper


BibTeX
@InProceedings{pmlr-v51-salah16, title = {Model-based Co-clustering for High Dimensional Sparse Data}, author = {Salah, Aghiles and Rogovschi, Nicoleta and Nadif, Mohamed}, booktitle = {Proceedings of the 19th International Conference on Artificial Intelligence and Statistics}, pages = {866--874}, year = {2016}, editor = {Gretton, Arthur and Robert, Christian C.}, volume = {51}, series = {Proceedings of Machine Learning Research}, address = {Cadiz, Spain}, month = {09--11 May}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v51/salah16.pdf}, url = {https://proceedings.mlr.press/v51/salah16.html}, abstract = {We propose a novel model based on the von Mises-Fisher (vMF) distribution for co-clustering high dimensional sparse matrices. While existing vMF-based models are only suitable for clustering along one dimension, our model acts simultaneously on both dimensions of a data matrix. Thereby it has the advantage of exploiting the inherent duality between rows and columns. Setting our model under the maximum likelihood (ML) approach and the classification ML (CML) approach, we derive two novel, hard and soft, co-clustering algorithms. Empirical results on numerous synthetic and real-world text datasets, demonstrate the effectiveness of our approach, for modelling high dimensional sparse data and co-clustering. Furthermore, thanks to our formulation, that performs an implicitly adaptive dimensionality reduction at each stage, our model alleviates the problem of high concentration parameters kappa’s, a well known difficulty in the classical vMF-based models.} }
Endnote
%0 Conference Paper %T Model-based Co-clustering for High Dimensional Sparse Data %A Aghiles Salah %A Nicoleta Rogovschi %A Mohamed Nadif %B Proceedings of the 19th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2016 %E Arthur Gretton %E Christian C. Robert %F pmlr-v51-salah16 %I PMLR %P 866--874 %U https://proceedings.mlr.press/v51/salah16.html %V 51 %X We propose a novel model based on the von Mises-Fisher (vMF) distribution for co-clustering high dimensional sparse matrices. While existing vMF-based models are only suitable for clustering along one dimension, our model acts simultaneously on both dimensions of a data matrix. Thereby it has the advantage of exploiting the inherent duality between rows and columns. Setting our model under the maximum likelihood (ML) approach and the classification ML (CML) approach, we derive two novel, hard and soft, co-clustering algorithms. Empirical results on numerous synthetic and real-world text datasets, demonstrate the effectiveness of our approach, for modelling high dimensional sparse data and co-clustering. Furthermore, thanks to our formulation, that performs an implicitly adaptive dimensionality reduction at each stage, our model alleviates the problem of high concentration parameters kappa’s, a well known difficulty in the classical vMF-based models.
RIS
TY - CPAPER TI - Model-based Co-clustering for High Dimensional Sparse Data AU - Aghiles Salah AU - Nicoleta Rogovschi AU - Mohamed Nadif BT - Proceedings of the 19th International Conference on Artificial Intelligence and Statistics DA - 2016/05/02 ED - Arthur Gretton ED - Christian C. Robert ID - pmlr-v51-salah16 PB - PMLR DP - Proceedings of Machine Learning Research VL - 51 SP - 866 EP - 874 L1 - http://proceedings.mlr.press/v51/salah16.pdf UR - https://proceedings.mlr.press/v51/salah16.html AB - We propose a novel model based on the von Mises-Fisher (vMF) distribution for co-clustering high dimensional sparse matrices. While existing vMF-based models are only suitable for clustering along one dimension, our model acts simultaneously on both dimensions of a data matrix. Thereby it has the advantage of exploiting the inherent duality between rows and columns. Setting our model under the maximum likelihood (ML) approach and the classification ML (CML) approach, we derive two novel, hard and soft, co-clustering algorithms. Empirical results on numerous synthetic and real-world text datasets, demonstrate the effectiveness of our approach, for modelling high dimensional sparse data and co-clustering. Furthermore, thanks to our formulation, that performs an implicitly adaptive dimensionality reduction at each stage, our model alleviates the problem of high concentration parameters kappa’s, a well known difficulty in the classical vMF-based models. ER -
APA
Salah, A., Rogovschi, N. & Nadif, M.. (2016). Model-based Co-clustering for High Dimensional Sparse Data. Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 51:866-874 Available from https://proceedings.mlr.press/v51/salah16.html.

Related Material