A Technical Note on Non-Stationary Parametric Bandits: Existing Mistakes and Preliminary Solutions

Louis Faury, Yoan Russac, Marc Abeille, Clément Calauzènes
Proceedings of the 32nd International Conference on Algorithmic Learning Theory, PMLR 132:619-626, 2021.

Abstract

In this note we identify several mistakes appearing in the existing literature on non-stationary parametric bandits. More precisely, we study Generalized Linear Bandits (GLBs) in drifting environments, where the level of non-stationarity is characterized by a general metric known as the variation-budget. Existing methods to solve such problems typically involve forgetting mechanisms, which allow for a fine balance between the learning and tracking requirements of the problem. We uncover two significant mistakes in their theoretical analysis. The first arises when bounding the tracking error suffered by forgetting mechanisms. The second emerges when considering non-linear reward models, which requires extra care to balance the learning and tracking guarantees. We introduce a geometrical assumption on the arm set, sufficient to overcome the aforementioned technical gaps and recover minimax-optimality. We also share preliminary attempts at fixing those gaps under general configurations. Unfortunately, our solution yields degraded rates (w.r.t to the horizon), which raises new open questions regarding the optimality of forgetting mechanisms in non-stationary parametric bandits.

Cite this Paper


BibTeX
@InProceedings{pmlr-v132-faury21a, title = {A Technical Note on Non-Stationary Parametric Bandits: Existing Mistakes and Preliminary Solutions}, author = {Faury, Louis and Russac, Yoan and Abeille, Marc and Calauz\`enes, Cl\'ement}, booktitle = {Proceedings of the 32nd International Conference on Algorithmic Learning Theory}, pages = {619--626}, year = {2021}, editor = {Vitaly Feldman and Katrina Ligett and Sivan Sabato}, volume = {132}, series = {Proceedings of Machine Learning Research}, month = {16--19 Mar}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v132/faury21a/faury21a.pdf}, url = { http://proceedings.mlr.press/v132/faury21a.html }, abstract = {In this note we identify several mistakes appearing in the existing literature on non-stationary parametric bandits. More precisely, we study Generalized Linear Bandits (GLBs) in drifting environments, where the level of non-stationarity is characterized by a general metric known as the variation-budget. Existing methods to solve such problems typically involve forgetting mechanisms, which allow for a fine balance between the learning and tracking requirements of the problem. We uncover two significant mistakes in their theoretical analysis. The first arises when bounding the tracking error suffered by forgetting mechanisms. The second emerges when considering non-linear reward models, which requires extra care to balance the learning and tracking guarantees. We introduce a geometrical assumption on the arm set, sufficient to overcome the aforementioned technical gaps and recover minimax-optimality. We also share preliminary attempts at fixing those gaps under general configurations. Unfortunately, our solution yields degraded rates (w.r.t to the horizon), which raises new open questions regarding the optimality of forgetting mechanisms in non-stationary parametric bandits.} }
Endnote
%0 Conference Paper %T A Technical Note on Non-Stationary Parametric Bandits: Existing Mistakes and Preliminary Solutions %A Louis Faury %A Yoan Russac %A Marc Abeille %A Clément Calauzènes %B Proceedings of the 32nd International Conference on Algorithmic Learning Theory %C Proceedings of Machine Learning Research %D 2021 %E Vitaly Feldman %E Katrina Ligett %E Sivan Sabato %F pmlr-v132-faury21a %I PMLR %P 619--626 %U http://proceedings.mlr.press/v132/faury21a.html %V 132 %X In this note we identify several mistakes appearing in the existing literature on non-stationary parametric bandits. More precisely, we study Generalized Linear Bandits (GLBs) in drifting environments, where the level of non-stationarity is characterized by a general metric known as the variation-budget. Existing methods to solve such problems typically involve forgetting mechanisms, which allow for a fine balance between the learning and tracking requirements of the problem. We uncover two significant mistakes in their theoretical analysis. The first arises when bounding the tracking error suffered by forgetting mechanisms. The second emerges when considering non-linear reward models, which requires extra care to balance the learning and tracking guarantees. We introduce a geometrical assumption on the arm set, sufficient to overcome the aforementioned technical gaps and recover minimax-optimality. We also share preliminary attempts at fixing those gaps under general configurations. Unfortunately, our solution yields degraded rates (w.r.t to the horizon), which raises new open questions regarding the optimality of forgetting mechanisms in non-stationary parametric bandits.
APA
Faury, L., Russac, Y., Abeille, M. & Calauzènes, C.. (2021). A Technical Note on Non-Stationary Parametric Bandits: Existing Mistakes and Preliminary Solutions. Proceedings of the 32nd International Conference on Algorithmic Learning Theory, in Proceedings of Machine Learning Research 132:619-626 Available from http://proceedings.mlr.press/v132/faury21a.html .

Related Material