Semi-supervised Clustering by Input Pattern Assisted Pairwise Similarity Matrix Completion

Jinfeng Yi, Lijun Zhang, Rong Jin, Qi Qian, Anil Jain
Proceedings of the 30th International Conference on Machine Learning, PMLR 28(3):1400-1408, 2013.

Abstract

Many semi-supervised clustering algorithms have been proposed to improve the clustering accuracy by effectively exploring the available side information that is usually in the form of pairwise constraints. Despite the progress, there are two main shortcomings of the existing semi-supervised clustering algorithms. First, they have to deal with non-convex optimization problems, leading to clustering results that are sensitive to the initialization. Second, none of these algorithms is equipped with theoretical guarantee regarding the clustering performance. We address these limitations by developing a framework for semi-supervised clustering based on \it input pattern assisted matrix completion. The key idea is to cast clustering into a matrix completion problem, and solve it efficiently by exploiting the correlation between input patterns and cluster assignments. Our analysis shows that under appropriate conditions, only O(\log n) pairwise constraints are needed to accurately recover the true cluster partition. We verify the effectiveness of the proposed algorithm by comparing it to the state-of-the-art semi-supervised clustering algorithms on several benchmark datasets.

Cite this Paper


BibTeX
@InProceedings{pmlr-v28-yi13, title = {Semi-supervised Clustering by Input Pattern Assisted Pairwise Similarity Matrix Completion}, author = {Yi, Jinfeng and Zhang, Lijun and Jin, Rong and Qian, Qi and Jain, Anil}, booktitle = {Proceedings of the 30th International Conference on Machine Learning}, pages = {1400--1408}, year = {2013}, editor = {Dasgupta, Sanjoy and McAllester, David}, volume = {28}, number = {3}, series = {Proceedings of Machine Learning Research}, address = {Atlanta, Georgia, USA}, month = {17--19 Jun}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v28/yi13.pdf}, url = {https://proceedings.mlr.press/v28/yi13.html}, abstract = {Many semi-supervised clustering algorithms have been proposed to improve the clustering accuracy by effectively exploring the available side information that is usually in the form of pairwise constraints. Despite the progress, there are two main shortcomings of the existing semi-supervised clustering algorithms. First, they have to deal with non-convex optimization problems, leading to clustering results that are sensitive to the initialization. Second, none of these algorithms is equipped with theoretical guarantee regarding the clustering performance. We address these limitations by developing a framework for semi-supervised clustering based on \it input pattern assisted matrix completion. The key idea is to cast clustering into a matrix completion problem, and solve it efficiently by exploiting the correlation between input patterns and cluster assignments. Our analysis shows that under appropriate conditions, only O(\log n) pairwise constraints are needed to accurately recover the true cluster partition. We verify the effectiveness of the proposed algorithm by comparing it to the state-of-the-art semi-supervised clustering algorithms on several benchmark datasets.} }
Endnote
%0 Conference Paper %T Semi-supervised Clustering by Input Pattern Assisted Pairwise Similarity Matrix Completion %A Jinfeng Yi %A Lijun Zhang %A Rong Jin %A Qi Qian %A Anil Jain %B Proceedings of the 30th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2013 %E Sanjoy Dasgupta %E David McAllester %F pmlr-v28-yi13 %I PMLR %P 1400--1408 %U https://proceedings.mlr.press/v28/yi13.html %V 28 %N 3 %X Many semi-supervised clustering algorithms have been proposed to improve the clustering accuracy by effectively exploring the available side information that is usually in the form of pairwise constraints. Despite the progress, there are two main shortcomings of the existing semi-supervised clustering algorithms. First, they have to deal with non-convex optimization problems, leading to clustering results that are sensitive to the initialization. Second, none of these algorithms is equipped with theoretical guarantee regarding the clustering performance. We address these limitations by developing a framework for semi-supervised clustering based on \it input pattern assisted matrix completion. The key idea is to cast clustering into a matrix completion problem, and solve it efficiently by exploiting the correlation between input patterns and cluster assignments. Our analysis shows that under appropriate conditions, only O(\log n) pairwise constraints are needed to accurately recover the true cluster partition. We verify the effectiveness of the proposed algorithm by comparing it to the state-of-the-art semi-supervised clustering algorithms on several benchmark datasets.
RIS
TY - CPAPER TI - Semi-supervised Clustering by Input Pattern Assisted Pairwise Similarity Matrix Completion AU - Jinfeng Yi AU - Lijun Zhang AU - Rong Jin AU - Qi Qian AU - Anil Jain BT - Proceedings of the 30th International Conference on Machine Learning DA - 2013/05/26 ED - Sanjoy Dasgupta ED - David McAllester ID - pmlr-v28-yi13 PB - PMLR DP - Proceedings of Machine Learning Research VL - 28 IS - 3 SP - 1400 EP - 1408 L1 - http://proceedings.mlr.press/v28/yi13.pdf UR - https://proceedings.mlr.press/v28/yi13.html AB - Many semi-supervised clustering algorithms have been proposed to improve the clustering accuracy by effectively exploring the available side information that is usually in the form of pairwise constraints. Despite the progress, there are two main shortcomings of the existing semi-supervised clustering algorithms. First, they have to deal with non-convex optimization problems, leading to clustering results that are sensitive to the initialization. Second, none of these algorithms is equipped with theoretical guarantee regarding the clustering performance. We address these limitations by developing a framework for semi-supervised clustering based on \it input pattern assisted matrix completion. The key idea is to cast clustering into a matrix completion problem, and solve it efficiently by exploiting the correlation between input patterns and cluster assignments. Our analysis shows that under appropriate conditions, only O(\log n) pairwise constraints are needed to accurately recover the true cluster partition. We verify the effectiveness of the proposed algorithm by comparing it to the state-of-the-art semi-supervised clustering algorithms on several benchmark datasets. ER -
APA
Yi, J., Zhang, L., Jin, R., Qian, Q. & Jain, A.. (2013). Semi-supervised Clustering by Input Pattern Assisted Pairwise Similarity Matrix Completion. Proceedings of the 30th International Conference on Machine Learning, in Proceedings of Machine Learning Research 28(3):1400-1408 Available from https://proceedings.mlr.press/v28/yi13.html.

Related Material