Model-based Co-clustering for High Dimensional Sparse Data

Aghiles Salah; Nicoleta Rogovschi; Mohamed Nadif

Model-based Co-clustering for High Dimensional Sparse Data

Aghiles Salah, Nicoleta Rogovschi, Mohamed Nadif

Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, PMLR 51:866-874, 2016.

Abstract

We propose a novel model based on the von Mises-Fisher (vMF) distribution for co-clustering high dimensional sparse matrices. While existing vMF-based models are only suitable for clustering along one dimension, our model acts simultaneously on both dimensions of a data matrix. Thereby it has the advantage of exploiting the inherent duality between rows and columns. Setting our model under the maximum likelihood (ML) approach and the classification ML (CML) approach, we derive two novel, hard and soft, co-clustering algorithms. Empirical results on numerous synthetic and real-world text datasets, demonstrate the effectiveness of our approach, for modelling high dimensional sparse data and co-clustering. Furthermore, thanks to our formulation, that performs an implicitly adaptive dimensionality reduction at each stage, our model alleviates the problem of high concentration parameters kappa’s, a well known difficulty in the classical vMF-based models.

Cite this Paper

BibTeX


@InProceedings{pmlr-v51-salah16,
  title = 	 {Model-based Co-clustering for High Dimensional Sparse Data},
  author = 	 {Salah, Aghiles and Rogovschi, Nicoleta and Nadif, Mohamed},
  booktitle = 	 {Proceedings of the 19th International Conference on Artificial Intelligence and Statistics},
  pages = 	 {866--874},
  year = 	 {2016},
  editor = 	 {Gretton, Arthur and Robert, Christian C.},
  volume = 	 {51},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {Cadiz, Spain},
  month = 	 {09--11 May},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v51/salah16.pdf},
  url = 	 {https://proceedings.mlr.press/v51/salah16.html},
  abstract = 	 {We propose a novel model based on the von Mises-Fisher (vMF) distribution for co-clustering high dimensional sparse matrices. While existing vMF-based models are only suitable for clustering along one dimension, our model acts simultaneously on both dimensions of a data matrix. Thereby it has the advantage of exploiting the inherent duality between rows and columns. Setting our model under the maximum likelihood (ML) approach and the classification ML (CML) approach, we derive two novel, hard and soft, co-clustering algorithms. Empirical results on numerous synthetic and real-world text datasets, demonstrate the effectiveness of our approach, for modelling high dimensional sparse data and co-clustering. Furthermore, thanks to our formulation, that performs an implicitly adaptive dimensionality reduction at each stage, our model alleviates the problem of high concentration parameters kappa’s, a well known difficulty in the classical vMF-based models.}
}

Endnote

%0 Conference Paper
%T Model-based Co-clustering for High Dimensional Sparse Data
%A Aghiles Salah
%A Nicoleta Rogovschi
%A Mohamed Nadif
%B Proceedings of the 19th International Conference on Artificial Intelligence and Statistics
%C Proceedings of Machine Learning Research
%D 2016
%E Arthur Gretton
%E Christian C. Robert	
%F pmlr-v51-salah16
%I PMLR
%P 866--874
%U https://proceedings.mlr.press/v51/salah16.html
%V 51
%X We propose a novel model based on the von Mises-Fisher (vMF) distribution for co-clustering high dimensional sparse matrices. While existing vMF-based models are only suitable for clustering along one dimension, our model acts simultaneously on both dimensions of a data matrix. Thereby it has the advantage of exploiting the inherent duality between rows and columns. Setting our model under the maximum likelihood (ML) approach and the classification ML (CML) approach, we derive two novel, hard and soft, co-clustering algorithms. Empirical results on numerous synthetic and real-world text datasets, demonstrate the effectiveness of our approach, for modelling high dimensional sparse data and co-clustering. Furthermore, thanks to our formulation, that performs an implicitly adaptive dimensionality reduction at each stage, our model alleviates the problem of high concentration parameters kappa’s, a well known difficulty in the classical vMF-based models.

RIS


TY  - CPAPER
TI  - Model-based Co-clustering for High Dimensional Sparse Data
AU  - Aghiles Salah
AU  - Nicoleta Rogovschi
AU  - Mohamed Nadif
BT  - Proceedings of the 19th International Conference on Artificial Intelligence and Statistics
DA  - 2016/05/02
ED  - Arthur Gretton
ED  - Christian C. Robert	
ID  - pmlr-v51-salah16
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 51
SP  - 866
EP  - 874
L1  - http://proceedings.mlr.press/v51/salah16.pdf
UR  - https://proceedings.mlr.press/v51/salah16.html
AB  - We propose a novel model based on the von Mises-Fisher (vMF) distribution for co-clustering high dimensional sparse matrices. While existing vMF-based models are only suitable for clustering along one dimension, our model acts simultaneously on both dimensions of a data matrix. Thereby it has the advantage of exploiting the inherent duality between rows and columns. Setting our model under the maximum likelihood (ML) approach and the classification ML (CML) approach, we derive two novel, hard and soft, co-clustering algorithms. Empirical results on numerous synthetic and real-world text datasets, demonstrate the effectiveness of our approach, for modelling high dimensional sparse data and co-clustering. Furthermore, thanks to our formulation, that performs an implicitly adaptive dimensionality reduction at each stage, our model alleviates the problem of high concentration parameters kappa’s, a well known difficulty in the classical vMF-based models.
ER  -

APA


Salah, A., Rogovschi, N. & Nadif, M.. (2016). Model-based Co-clustering for High Dimensional Sparse Data. Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 51:866-874 Available from https://proceedings.mlr.press/v51/salah16.html.

Model-based Co-clustering for High Dimensional Sparse Data

Abstract

Cite this Paper

Related Material