Convergence Rates of Gradient Descent and MM Algorithms for Bradley-Terry Models

Milan Vojnovic, Se-Young Yun, Kaifang Zhou
Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, PMLR 108:1254-1264, 2020.

Abstract

We present tight convergence rate bounds for gradient descent and MM algorithms for maximum likelihood (ML) estimation and maximum a posteriori probability (MAP) estimation of a popular Bayesian inference method, for Bradley-Terry models of ranking data. Our results show that MM algorithms have the same convergence rate, up to a constant factor, as gradient descent algorithms with optimal constant step size. For the ML estimation objective, the convergence is linear with the rate crucially determined by the algebraic connectivity of the matrix of item pair co-occurrences in observed comparison data. For the MAP estimation objective, we show that the convergence rate is also linear, with the rate determined by a parameter of the prior distribution in a way that can make convergence arbitrarily slow for small values of this parameter. The limit of small values of this parameter corresponds to a flat, non-informative prior distribution.

Cite this Paper


BibTeX
@InProceedings{pmlr-v108-vojnovic20a, title = {Convergence Rates of Gradient Descent and MM Algorithms for Bradley-Terry Models}, author = {Vojnovic, Milan and Yun, Se-Young and Zhou, Kaifang}, booktitle = {Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics}, pages = {1254--1264}, year = {2020}, editor = {Chiappa, Silvia and Calandra, Roberto}, volume = {108}, series = {Proceedings of Machine Learning Research}, month = {26--28 Aug}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v108/vojnovic20a/vojnovic20a.pdf}, url = {https://proceedings.mlr.press/v108/vojnovic20a.html}, abstract = {We present tight convergence rate bounds for gradient descent and MM algorithms for maximum likelihood (ML) estimation and maximum a posteriori probability (MAP) estimation of a popular Bayesian inference method, for Bradley-Terry models of ranking data. Our results show that MM algorithms have the same convergence rate, up to a constant factor, as gradient descent algorithms with optimal constant step size. For the ML estimation objective, the convergence is linear with the rate crucially determined by the algebraic connectivity of the matrix of item pair co-occurrences in observed comparison data. For the MAP estimation objective, we show that the convergence rate is also linear, with the rate determined by a parameter of the prior distribution in a way that can make convergence arbitrarily slow for small values of this parameter. The limit of small values of this parameter corresponds to a flat, non-informative prior distribution. } }
Endnote
%0 Conference Paper %T Convergence Rates of Gradient Descent and MM Algorithms for Bradley-Terry Models %A Milan Vojnovic %A Se-Young Yun %A Kaifang Zhou %B Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2020 %E Silvia Chiappa %E Roberto Calandra %F pmlr-v108-vojnovic20a %I PMLR %P 1254--1264 %U https://proceedings.mlr.press/v108/vojnovic20a.html %V 108 %X We present tight convergence rate bounds for gradient descent and MM algorithms for maximum likelihood (ML) estimation and maximum a posteriori probability (MAP) estimation of a popular Bayesian inference method, for Bradley-Terry models of ranking data. Our results show that MM algorithms have the same convergence rate, up to a constant factor, as gradient descent algorithms with optimal constant step size. For the ML estimation objective, the convergence is linear with the rate crucially determined by the algebraic connectivity of the matrix of item pair co-occurrences in observed comparison data. For the MAP estimation objective, we show that the convergence rate is also linear, with the rate determined by a parameter of the prior distribution in a way that can make convergence arbitrarily slow for small values of this parameter. The limit of small values of this parameter corresponds to a flat, non-informative prior distribution.
APA
Vojnovic, M., Yun, S. & Zhou, K.. (2020). Convergence Rates of Gradient Descent and MM Algorithms for Bradley-Terry Models. Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 108:1254-1264 Available from https://proceedings.mlr.press/v108/vojnovic20a.html.

Related Material