ANACONDA: An Improved Dynamic Regret Algorithm for Adaptive Non-Stationary Dueling Bandits

Thomas Kleine Buening; Aadirupa Saha

ANACONDA: An Improved Dynamic Regret Algorithm for Adaptive Non-Stationary Dueling Bandits

Thomas Kleine Buening, Aadirupa Saha

Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, PMLR 206:3854-3878, 2023.

Abstract

We study the problem of non-stationary dueling bandits and provide the first adaptive dynamic regret algorithm for this problem. The only two existing attempts in this line of work fall short across multiple dimensions, including pessimistic measures of non-stationary complexity and non-adaptive parameter tuning that requires knowledge of the number of preference changes. We develop an elimination-based rescheduling algorithm to overcome these shortcomings and show a near-optimal $\tilde O(\sqrt{S^{CW} T})$ dynamic regret bound, where $\S^{CW}$ is the number of times the Condorcet winner changes in $T$ rounds. This yields the first near-optimal dynamic regret bound for unknown $S^{CW}$. We further study other related notions of non-stationarity for which we also prove near-optimal dynamic regret guarantees under additional assumptions on the preference model.

Cite this Paper

BibTeX

@InProceedings{pmlr-v206-kleine-buening23a,
  title = 	 {ANACONDA: An Improved Dynamic Regret Algorithm for Adaptive Non-Stationary Dueling Bandits},
  author =       {Kleine Buening, Thomas and Saha, Aadirupa},
  booktitle = 	 {Proceedings of The 26th International Conference on Artificial Intelligence and Statistics},
  pages = 	 {3854--3878},
  year = 	 {2023},
  editor = 	 {Ruiz, Francisco and Dy, Jennifer and van de Meent, Jan-Willem},
  volume = 	 {206},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {25--27 Apr},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v206/kleine-buening23a/kleine-buening23a.pdf},
  url = 	 {https://proceedings.mlr.press/v206/kleine-buening23a.html},
  abstract = 	 {We study the problem of non-stationary dueling bandits and provide the first adaptive dynamic regret algorithm for this problem. The only two existing attempts in this line of work fall short across multiple dimensions, including pessimistic measures of non-stationary complexity and non-adaptive parameter tuning that requires knowledge of the number of preference changes. We develop an elimination-based rescheduling algorithm to overcome these shortcomings and show a near-optimal $\tilde O(\sqrt{S^{CW} T})$ dynamic regret bound, where $\S^{CW}$ is the number of times the Condorcet winner changes in $T$ rounds. This yields the first near-optimal dynamic regret bound for unknown $S^{CW}$. We further study other related notions of non-stationarity for which we also prove near-optimal dynamic regret guarantees under additional assumptions on the preference model.}
}

Endnote

%0 Conference Paper
%T ANACONDA: An Improved Dynamic Regret Algorithm for Adaptive Non-Stationary Dueling Bandits
%A Thomas Kleine Buening
%A Aadirupa Saha
%B Proceedings of The 26th International Conference on Artificial Intelligence and Statistics
%C Proceedings of Machine Learning Research
%D 2023
%E Francisco Ruiz
%E Jennifer Dy
%E Jan-Willem van de Meent	
%F pmlr-v206-kleine-buening23a
%I PMLR
%P 3854--3878
%U https://proceedings.mlr.press/v206/kleine-buening23a.html
%V 206
%X We study the problem of non-stationary dueling bandits and provide the first adaptive dynamic regret algorithm for this problem. The only two existing attempts in this line of work fall short across multiple dimensions, including pessimistic measures of non-stationary complexity and non-adaptive parameter tuning that requires knowledge of the number of preference changes. We develop an elimination-based rescheduling algorithm to overcome these shortcomings and show a near-optimal $\tilde O(\sqrt{S^{CW} T})$ dynamic regret bound, where $\S^{CW}$ is the number of times the Condorcet winner changes in $T$ rounds. This yields the first near-optimal dynamic regret bound for unknown $S^{CW}$. We further study other related notions of non-stationarity for which we also prove near-optimal dynamic regret guarantees under additional assumptions on the preference model.

APA

Kleine Buening, T. & Saha, A.. (2023). ANACONDA: An Improved Dynamic Regret Algorithm for Adaptive Non-Stationary Dueling Bandits. Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 206:3854-3878 Available from https://proceedings.mlr.press/v206/kleine-buening23a.html.

Related Material

Download PDF