The Information-Theoretic Requirements of Subspace Clustering with Missing Data

Daniel Pimentel-Alarcon, Robert Nowak
Proceedings of The 33rd International Conference on Machine Learning, PMLR 48:802-810, 2016.

Abstract

Subspace clustering with missing data (SCMD) is a useful tool for analyzing incomplete datasets. Let d be the ambient dimension, and r the dimension of the subspaces. Existing theory shows that Nk = O(r d) columns per subspace are necessary for SCMD, and Nk =O(min d^(log d), d^(r+1) ) are sufficient. We close this gap, showing that Nk =O(r d) is also sufficient. To do this we derive deterministic sampling conditions for SCMD, which give precise information theoretic requirements and determine sampling regimes. These results explain the performance of SCMD algorithms from the literature. Finally, we give a practical algorithm to certify the output of any SCMD method deterministically.

Cite this Paper


BibTeX
@InProceedings{pmlr-v48-pimentel-alarcon16, title = {The Information-Theoretic Requirements of Subspace Clustering with Missing Data}, author = {Pimentel-Alarcon, Daniel and Nowak, Robert}, booktitle = {Proceedings of The 33rd International Conference on Machine Learning}, pages = {802--810}, year = {2016}, editor = {Balcan, Maria Florina and Weinberger, Kilian Q.}, volume = {48}, series = {Proceedings of Machine Learning Research}, address = {New York, New York, USA}, month = {20--22 Jun}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v48/pimentel-alarcon16.pdf}, url = { http://proceedings.mlr.press/v48/pimentel-alarcon16.html }, abstract = {Subspace clustering with missing data (SCMD) is a useful tool for analyzing incomplete datasets. Let d be the ambient dimension, and r the dimension of the subspaces. Existing theory shows that Nk = O(r d) columns per subspace are necessary for SCMD, and Nk =O(min d^(log d), d^(r+1) ) are sufficient. We close this gap, showing that Nk =O(r d) is also sufficient. To do this we derive deterministic sampling conditions for SCMD, which give precise information theoretic requirements and determine sampling regimes. These results explain the performance of SCMD algorithms from the literature. Finally, we give a practical algorithm to certify the output of any SCMD method deterministically.} }
Endnote
%0 Conference Paper %T The Information-Theoretic Requirements of Subspace Clustering with Missing Data %A Daniel Pimentel-Alarcon %A Robert Nowak %B Proceedings of The 33rd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2016 %E Maria Florina Balcan %E Kilian Q. Weinberger %F pmlr-v48-pimentel-alarcon16 %I PMLR %P 802--810 %U http://proceedings.mlr.press/v48/pimentel-alarcon16.html %V 48 %X Subspace clustering with missing data (SCMD) is a useful tool for analyzing incomplete datasets. Let d be the ambient dimension, and r the dimension of the subspaces. Existing theory shows that Nk = O(r d) columns per subspace are necessary for SCMD, and Nk =O(min d^(log d), d^(r+1) ) are sufficient. We close this gap, showing that Nk =O(r d) is also sufficient. To do this we derive deterministic sampling conditions for SCMD, which give precise information theoretic requirements and determine sampling regimes. These results explain the performance of SCMD algorithms from the literature. Finally, we give a practical algorithm to certify the output of any SCMD method deterministically.
RIS
TY - CPAPER TI - The Information-Theoretic Requirements of Subspace Clustering with Missing Data AU - Daniel Pimentel-Alarcon AU - Robert Nowak BT - Proceedings of The 33rd International Conference on Machine Learning DA - 2016/06/11 ED - Maria Florina Balcan ED - Kilian Q. Weinberger ID - pmlr-v48-pimentel-alarcon16 PB - PMLR DP - Proceedings of Machine Learning Research VL - 48 SP - 802 EP - 810 L1 - http://proceedings.mlr.press/v48/pimentel-alarcon16.pdf UR - http://proceedings.mlr.press/v48/pimentel-alarcon16.html AB - Subspace clustering with missing data (SCMD) is a useful tool for analyzing incomplete datasets. Let d be the ambient dimension, and r the dimension of the subspaces. Existing theory shows that Nk = O(r d) columns per subspace are necessary for SCMD, and Nk =O(min d^(log d), d^(r+1) ) are sufficient. We close this gap, showing that Nk =O(r d) is also sufficient. To do this we derive deterministic sampling conditions for SCMD, which give precise information theoretic requirements and determine sampling regimes. These results explain the performance of SCMD algorithms from the literature. Finally, we give a practical algorithm to certify the output of any SCMD method deterministically. ER -
APA
Pimentel-Alarcon, D. & Nowak, R.. (2016). The Information-Theoretic Requirements of Subspace Clustering with Missing Data. Proceedings of The 33rd International Conference on Machine Learning, in Proceedings of Machine Learning Research 48:802-810 Available from http://proceedings.mlr.press/v48/pimentel-alarcon16.html .

Related Material