Online Density Estimation of Bradley-Terry Models

Issei Matsumoto, Kohei Hatano, Eiji Takimoto
Proceedings of The 28th Conference on Learning Theory, PMLR 40:1343-1359, 2015.

Abstract

We consider an online density estimation problem for the Bradley-Terry model, where each model parameter defines the probability of a match result between any pair in a set of n teams. The problem is hard because the loss function (i.e., the negative log-likelihood function in our problem setting) is not convex. To avoid the non-convexity, we can change parameters so that the loss function becomes convex with respect to the new parameter. But then the radius K of the reparameterized domain may be infinite, where K depends on the outcome sequence. So we put a mild assumption that guarantees that K is finite. We can thus employ standard online convex optimization algorithms, namely OGD and ONS, over the reparameterized domain, and get regret bounds O(n^\frac12(\ln K)\sqrtT) and O(n^\frac32K\ln T), respectively, where T is the horizon of the game. The bounds roughly means that OGD is better when K is large while ONS is better when K is small. But how large can K be? We show that K can be as large as Θ(T^n-1), which implies that the worst case regret bounds of OGD and ONS are O(n^\frac32\sqrtT\ln T) and \tildeO(n^\frac32(T)^n-1), respectively. We then propose a version of Follow the Regularized Leader, whose regret bound is close to the minimum of those of OGD and ONS. In other words, our algorithm is competitive with both for a wide range of values of K. In particular, our algorithm achieves the worst case regret bound O(n^\frac52T^\frac13 \ln T), which is slightly better than OGD with respect to T. In addition, our algorithm works without the knowledge K, which is a practical advantage.

Cite this Paper


BibTeX
@InProceedings{pmlr-v40-Matsumoto15, title = {Online Density Estimation of Bradley-Terry Models}, author = {Matsumoto, Issei and Hatano, Kohei and Takimoto, Eiji}, booktitle = {Proceedings of The 28th Conference on Learning Theory}, pages = {1343--1359}, year = {2015}, editor = {Grünwald, Peter and Hazan, Elad and Kale, Satyen}, volume = {40}, series = {Proceedings of Machine Learning Research}, address = {Paris, France}, month = {03--06 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v40/Matsumoto15.pdf}, url = {https://proceedings.mlr.press/v40/Matsumoto15.html}, abstract = {We consider an online density estimation problem for the Bradley-Terry model, where each model parameter defines the probability of a match result between any pair in a set of n teams. The problem is hard because the loss function (i.e., the negative log-likelihood function in our problem setting) is not convex. To avoid the non-convexity, we can change parameters so that the loss function becomes convex with respect to the new parameter. But then the radius K of the reparameterized domain may be infinite, where K depends on the outcome sequence. So we put a mild assumption that guarantees that K is finite. We can thus employ standard online convex optimization algorithms, namely OGD and ONS, over the reparameterized domain, and get regret bounds O(n^\frac12(\ln K)\sqrtT) and O(n^\frac32K\ln T), respectively, where T is the horizon of the game. The bounds roughly means that OGD is better when K is large while ONS is better when K is small. But how large can K be? We show that K can be as large as Θ(T^n-1), which implies that the worst case regret bounds of OGD and ONS are O(n^\frac32\sqrtT\ln T) and \tildeO(n^\frac32(T)^n-1), respectively. We then propose a version of Follow the Regularized Leader, whose regret bound is close to the minimum of those of OGD and ONS. In other words, our algorithm is competitive with both for a wide range of values of K. In particular, our algorithm achieves the worst case regret bound O(n^\frac52T^\frac13 \ln T), which is slightly better than OGD with respect to T. In addition, our algorithm works without the knowledge K, which is a practical advantage.} }
Endnote
%0 Conference Paper %T Online Density Estimation of Bradley-Terry Models %A Issei Matsumoto %A Kohei Hatano %A Eiji Takimoto %B Proceedings of The 28th Conference on Learning Theory %C Proceedings of Machine Learning Research %D 2015 %E Peter Grünwald %E Elad Hazan %E Satyen Kale %F pmlr-v40-Matsumoto15 %I PMLR %P 1343--1359 %U https://proceedings.mlr.press/v40/Matsumoto15.html %V 40 %X We consider an online density estimation problem for the Bradley-Terry model, where each model parameter defines the probability of a match result between any pair in a set of n teams. The problem is hard because the loss function (i.e., the negative log-likelihood function in our problem setting) is not convex. To avoid the non-convexity, we can change parameters so that the loss function becomes convex with respect to the new parameter. But then the radius K of the reparameterized domain may be infinite, where K depends on the outcome sequence. So we put a mild assumption that guarantees that K is finite. We can thus employ standard online convex optimization algorithms, namely OGD and ONS, over the reparameterized domain, and get regret bounds O(n^\frac12(\ln K)\sqrtT) and O(n^\frac32K\ln T), respectively, where T is the horizon of the game. The bounds roughly means that OGD is better when K is large while ONS is better when K is small. But how large can K be? We show that K can be as large as Θ(T^n-1), which implies that the worst case regret bounds of OGD and ONS are O(n^\frac32\sqrtT\ln T) and \tildeO(n^\frac32(T)^n-1), respectively. We then propose a version of Follow the Regularized Leader, whose regret bound is close to the minimum of those of OGD and ONS. In other words, our algorithm is competitive with both for a wide range of values of K. In particular, our algorithm achieves the worst case regret bound O(n^\frac52T^\frac13 \ln T), which is slightly better than OGD with respect to T. In addition, our algorithm works without the knowledge K, which is a practical advantage.
RIS
TY - CPAPER TI - Online Density Estimation of Bradley-Terry Models AU - Issei Matsumoto AU - Kohei Hatano AU - Eiji Takimoto BT - Proceedings of The 28th Conference on Learning Theory DA - 2015/06/26 ED - Peter Grünwald ED - Elad Hazan ED - Satyen Kale ID - pmlr-v40-Matsumoto15 PB - PMLR DP - Proceedings of Machine Learning Research VL - 40 SP - 1343 EP - 1359 L1 - http://proceedings.mlr.press/v40/Matsumoto15.pdf UR - https://proceedings.mlr.press/v40/Matsumoto15.html AB - We consider an online density estimation problem for the Bradley-Terry model, where each model parameter defines the probability of a match result between any pair in a set of n teams. The problem is hard because the loss function (i.e., the negative log-likelihood function in our problem setting) is not convex. To avoid the non-convexity, we can change parameters so that the loss function becomes convex with respect to the new parameter. But then the radius K of the reparameterized domain may be infinite, where K depends on the outcome sequence. So we put a mild assumption that guarantees that K is finite. We can thus employ standard online convex optimization algorithms, namely OGD and ONS, over the reparameterized domain, and get regret bounds O(n^\frac12(\ln K)\sqrtT) and O(n^\frac32K\ln T), respectively, where T is the horizon of the game. The bounds roughly means that OGD is better when K is large while ONS is better when K is small. But how large can K be? We show that K can be as large as Θ(T^n-1), which implies that the worst case regret bounds of OGD and ONS are O(n^\frac32\sqrtT\ln T) and \tildeO(n^\frac32(T)^n-1), respectively. We then propose a version of Follow the Regularized Leader, whose regret bound is close to the minimum of those of OGD and ONS. In other words, our algorithm is competitive with both for a wide range of values of K. In particular, our algorithm achieves the worst case regret bound O(n^\frac52T^\frac13 \ln T), which is slightly better than OGD with respect to T. In addition, our algorithm works without the knowledge K, which is a practical advantage. ER -
APA
Matsumoto, I., Hatano, K. & Takimoto, E.. (2015). Online Density Estimation of Bradley-Terry Models. Proceedings of The 28th Conference on Learning Theory, in Proceedings of Machine Learning Research 40:1343-1359 Available from https://proceedings.mlr.press/v40/Matsumoto15.html.

Related Material