Concentric mixtures of Mallows models for top-$k$ rankings: sampling and identifiability

Fabien Collas, Ekhine Irurozki
Proceedings of the 38th International Conference on Machine Learning, PMLR 139:2079-2088, 2021.

Abstract

In this paper, we study mixtures of two Mallows models for top-$k$ rankings with equal location parameters but with different scale parameters (a mixture of concentric Mallows models). These models arise when we have a heterogeneous population of voters formed by two populations, one of which is a subpopulation of expert voters. We show the identifiability of both components and the learnability of their respective parameters. These results are based upon, first, bounding the sample complexity for the Borda algorithm with top-$k$ rankings. Second, we characterize the distances between rankings, showing that an off-the-shelf clustering algorithm separates the rankings by components with high probability -provided the scales are well-separated.As a by-product, we include an efficient sampling algorithm for Mallows top-$k$ rankings. Finally, since the rank aggregation will suffer from a large amount of noise introduced by the non-expert voters, we adapt the Borda algorithm to be able to recover the ground truth consensus ranking which is especially consistent with the expert rankings.

Cite this Paper


BibTeX
@InProceedings{pmlr-v139-collas21a, title = {Concentric mixtures of Mallows models for top-$k$ rankings: sampling and identifiability}, author = {Collas, Fabien and Irurozki, Ekhine}, booktitle = {Proceedings of the 38th International Conference on Machine Learning}, pages = {2079--2088}, year = {2021}, editor = {Meila, Marina and Zhang, Tong}, volume = {139}, series = {Proceedings of Machine Learning Research}, month = {18--24 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v139/collas21a/collas21a.pdf}, url = {https://proceedings.mlr.press/v139/collas21a.html}, abstract = {In this paper, we study mixtures of two Mallows models for top-$k$ rankings with equal location parameters but with different scale parameters (a mixture of concentric Mallows models). These models arise when we have a heterogeneous population of voters formed by two populations, one of which is a subpopulation of expert voters. We show the identifiability of both components and the learnability of their respective parameters. These results are based upon, first, bounding the sample complexity for the Borda algorithm with top-$k$ rankings. Second, we characterize the distances between rankings, showing that an off-the-shelf clustering algorithm separates the rankings by components with high probability -provided the scales are well-separated.As a by-product, we include an efficient sampling algorithm for Mallows top-$k$ rankings. Finally, since the rank aggregation will suffer from a large amount of noise introduced by the non-expert voters, we adapt the Borda algorithm to be able to recover the ground truth consensus ranking which is especially consistent with the expert rankings.} }
Endnote
%0 Conference Paper %T Concentric mixtures of Mallows models for top-$k$ rankings: sampling and identifiability %A Fabien Collas %A Ekhine Irurozki %B Proceedings of the 38th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2021 %E Marina Meila %E Tong Zhang %F pmlr-v139-collas21a %I PMLR %P 2079--2088 %U https://proceedings.mlr.press/v139/collas21a.html %V 139 %X In this paper, we study mixtures of two Mallows models for top-$k$ rankings with equal location parameters but with different scale parameters (a mixture of concentric Mallows models). These models arise when we have a heterogeneous population of voters formed by two populations, one of which is a subpopulation of expert voters. We show the identifiability of both components and the learnability of their respective parameters. These results are based upon, first, bounding the sample complexity for the Borda algorithm with top-$k$ rankings. Second, we characterize the distances between rankings, showing that an off-the-shelf clustering algorithm separates the rankings by components with high probability -provided the scales are well-separated.As a by-product, we include an efficient sampling algorithm for Mallows top-$k$ rankings. Finally, since the rank aggregation will suffer from a large amount of noise introduced by the non-expert voters, we adapt the Borda algorithm to be able to recover the ground truth consensus ranking which is especially consistent with the expert rankings.
APA
Collas, F. & Irurozki, E.. (2021). Concentric mixtures of Mallows models for top-$k$ rankings: sampling and identifiability. Proceedings of the 38th International Conference on Machine Learning, in Proceedings of Machine Learning Research 139:2079-2088 Available from https://proceedings.mlr.press/v139/collas21a.html.

Related Material