Convergence Rates of Gradient Descent and MM Algorithms for Bradley-Terry Models
Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, PMLR 108:1254-1264, 2020.
We present tight convergence rate bounds for gradient descent and MM algorithms for maximum likelihood (ML) estimation and maximum a posteriori probability (MAP) estimation of a popular Bayesian inference method, for Bradley-Terry models of ranking data. Our results show that MM algorithms have the same convergence rate, up to a constant factor, as gradient descent algorithms with optimal constant step size. For the ML estimation objective, the convergence is linear with the rate crucially determined by the algebraic connectivity of the matrix of item pair co-occurrences in observed comparison data. For the MAP estimation objective, we show that the convergence rate is also linear, with the rate determined by a parameter of the prior distribution in a way that can make convergence arbitrarily slow for small values of this parameter. The limit of small values of this parameter corresponds to a flat, non-informative prior distribution.