Collaborative Clustering: Sample Complexity and Efficient Algorithms

Jungseul Ok; Se-Young Yun; Alexandre Proutiere; Rami Mochaourab

Collaborative Clustering: Sample Complexity and Efficient Algorithms

Jungseul Ok, Se-Young Yun, Alexandre Proutiere, Rami Mochaourab

Proceedings of the 28th International Conference on Algorithmic Learning Theory, PMLR 76:288-329, 2017.

Abstract

We study the problem of collaborative clustering. This problem is concerned with a set of items grouped into clusters that we wish to recover from ratings provided by users. The latter are also clustered, and each user rates a random but typical small number of items. The observed ratings are random variables whose distributions depend on the item and user clusters only. Unlike for collaborative filtering problems where one needs to recover both user and item clusters, here we only wish to classify items. The number of items rated by a user can be so small that anyway, estimating user clusters may be hopeless. For the collaborative clustering problem, we derive fundamental performance limits satisfied by any algorithm. Specifically, we identify the number of ratings needed to guarantee the existence of an algorithm recovering the clusters with a prescribed level of accuracy. We also propose SplitSpec, an algorithm whose performance matches these fundamental performance limit order-wise. In turn, SplitSpec is able to exploit, as much as this is possible, the users’ structure to improve the item cluster estimates.

Cite this Paper

BibTeX

@InProceedings{pmlr-v76-ok17a,
  title = 	 {Collaborative Clustering: Sample Complexity and Efficient Algorithms},
  author = 	 {Ok, Jungseul and Yun, Se-Young and Proutiere, Alexandre and Mochaourab, Rami},
  booktitle = 	 {Proceedings of the 28th International Conference on Algorithmic Learning Theory},
  pages = 	 {288--329},
  year = 	 {2017},
  editor = 	 {Hanneke, Steve and Reyzin, Lev},
  volume = 	 {76},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {15--17 Oct},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v76/ok17a/ok17a.pdf},
  url = 	 {https://proceedings.mlr.press/v76/ok17a.html},
  abstract = 	 {We study the problem of collaborative clustering. This problem is concerned with a set of items grouped into clusters that we wish to recover from ratings provided by users. The latter are also clustered, and each user rates a random but typical small number of items. The observed ratings are random variables whose distributions depend on the item and user clusters only. Unlike for collaborative filtering problems where one needs to recover both user and item clusters, here we only wish to classify items. The number of items rated by a user can be so small that anyway, estimating user clusters may be hopeless. For the collaborative clustering problem, we derive fundamental performance limits satisfied by any algorithm. Specifically, we identify the number of ratings needed to guarantee the existence of an algorithm recovering the clusters with a prescribed level of accuracy. We also propose SplitSpec, an algorithm whose performance matches these fundamental performance limit order-wise. In turn, SplitSpec is able to exploit, as much as this is possible, the users’ structure to improve the item cluster estimates.}
}

Endnote

%0 Conference Paper
%T Collaborative Clustering: Sample Complexity and Efficient Algorithms
%A Jungseul Ok
%A Se-Young Yun
%A Alexandre Proutiere
%A Rami Mochaourab
%B Proceedings of the 28th International Conference on Algorithmic Learning Theory
%C Proceedings of Machine Learning Research
%D 2017
%E Steve Hanneke
%E Lev Reyzin	
%F pmlr-v76-ok17a
%I PMLR
%P 288--329
%U https://proceedings.mlr.press/v76/ok17a.html
%V 76
%X We study the problem of collaborative clustering. This problem is concerned with a set of items grouped into clusters that we wish to recover from ratings provided by users. The latter are also clustered, and each user rates a random but typical small number of items. The observed ratings are random variables whose distributions depend on the item and user clusters only. Unlike for collaborative filtering problems where one needs to recover both user and item clusters, here we only wish to classify items. The number of items rated by a user can be so small that anyway, estimating user clusters may be hopeless. For the collaborative clustering problem, we derive fundamental performance limits satisfied by any algorithm. Specifically, we identify the number of ratings needed to guarantee the existence of an algorithm recovering the clusters with a prescribed level of accuracy. We also propose SplitSpec, an algorithm whose performance matches these fundamental performance limit order-wise. In turn, SplitSpec is able to exploit, as much as this is possible, the users’ structure to improve the item cluster estimates.

APA

Ok, J., Yun, S., Proutiere, A. & Mochaourab, R.. (2017). Collaborative Clustering: Sample Complexity and Efficient Algorithms. Proceedings of the 28th International Conference on Algorithmic Learning Theory, in Proceedings of Machine Learning Research 76:288-329 Available from https://proceedings.mlr.press/v76/ok17a.html.

Related Material

Download PDF