On Similarity Prediction and Pairwise Clustering

Stephen Pasteris, Fabio Vitale, Claudio Gentile, Mark Herbster
Proceedings of Algorithmic Learning Theory, PMLR 83:654-681, 2018.

Abstract

We consider the problem of clustering a finite set of items from pairwise similarity information. Unlike what is done in the literature on this subject, we do so in a passive learning setting, and with no specific constraints on the cluster shapes other than their size. We investigate the problem in different settings: i. an online setting, where we provide a tight characterization of the prediction complexity in the mistake bound model, and ii. a standard stochastic batch setting, where we give tight upper and lower bounds on the achievable generalization error. Prediction performance is measured both in terms of the ability to recover the similarity function encoding the hidden clustering and in terms of how well we classify each item within the set. The proposed algorithms are time efficient.

Cite this Paper


BibTeX
@InProceedings{pmlr-v83-pasteris18a, title = {On Similarity Prediction and Pairwise Clustering}, author = {Pasteris, Stephen and Vitale, Fabio and Gentile, Claudio and Herbster, Mark}, booktitle = {Proceedings of Algorithmic Learning Theory}, pages = {654--681}, year = {2018}, editor = {Janoos, Firdaus and Mohri, Mehryar and Sridharan, Karthik}, volume = {83}, series = {Proceedings of Machine Learning Research}, month = {07--09 Apr}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v83/pasteris18a/pasteris18a.pdf}, url = {https://proceedings.mlr.press/v83/pasteris18a.html}, abstract = {We consider the problem of clustering a finite set of items from pairwise similarity information. Unlike what is done in the literature on this subject, we do so in a passive learning setting, and with no specific constraints on the cluster shapes other than their size. We investigate the problem in different settings: i. an online setting, where we provide a tight characterization of the prediction complexity in the mistake bound model, and ii. a standard stochastic batch setting, where we give tight upper and lower bounds on the achievable generalization error. Prediction performance is measured both in terms of the ability to recover the similarity function encoding the hidden clustering and in terms of how well we classify each item within the set. The proposed algorithms are time efficient.} }
Endnote
%0 Conference Paper %T On Similarity Prediction and Pairwise Clustering %A Stephen Pasteris %A Fabio Vitale %A Claudio Gentile %A Mark Herbster %B Proceedings of Algorithmic Learning Theory %C Proceedings of Machine Learning Research %D 2018 %E Firdaus Janoos %E Mehryar Mohri %E Karthik Sridharan %F pmlr-v83-pasteris18a %I PMLR %P 654--681 %U https://proceedings.mlr.press/v83/pasteris18a.html %V 83 %X We consider the problem of clustering a finite set of items from pairwise similarity information. Unlike what is done in the literature on this subject, we do so in a passive learning setting, and with no specific constraints on the cluster shapes other than their size. We investigate the problem in different settings: i. an online setting, where we provide a tight characterization of the prediction complexity in the mistake bound model, and ii. a standard stochastic batch setting, where we give tight upper and lower bounds on the achievable generalization error. Prediction performance is measured both in terms of the ability to recover the similarity function encoding the hidden clustering and in terms of how well we classify each item within the set. The proposed algorithms are time efficient.
APA
Pasteris, S., Vitale, F., Gentile, C. & Herbster, M.. (2018). On Similarity Prediction and Pairwise Clustering. Proceedings of Algorithmic Learning Theory, in Proceedings of Machine Learning Research 83:654-681 Available from https://proceedings.mlr.press/v83/pasteris18a.html.

Related Material