Flickering Multi-Armed Bandits

Sourav Chakraborty; Amit Kiran Rege; Claire Monteleoni; Lijun Chen

Flickering Multi-Armed Bandits

Sourav Chakraborty, Amit Kiran Rege, Claire Monteleoni, Lijun Chen

Proceedings of The 8th Annual Learning for Dynamics and Control Conference, PMLR 331:1-70, 2026.

Abstract

We introduce Flickering Multi-Armed Bandits (FMAB), a framework for sequential decision-making where action availability evolves dynamically and depends on the agent’s current action choice. This model captures environments where movement is restricted to local neighborhoods of a time-varying graph. We analyze the FMAB under two canonical graph processes: i.i.d Erdős–Rényi (ER) evolution and Edge-Markovian dynamics. To address the dual challenges of learning and navigation, we propose a two-phase algorithm that utilizes a lazy random walk for efficient exploration followed by a commitment phase for exploitation. We establish high-probability and expected sublinear regret bounds, demonstrating that the exploration cost is near-optimal via a matching information-theoretic lower bound. Our results highlight the fundamental cost of exploration under local-move constraints. We compliment our theoretical guarantees with numerical simulations, including a disaster-response scenario involving a robotic vehicle navigating unstable communication landscapes.

Cite this Paper

BibTeX

@InProceedings{pmlr-v331-chakraborty26a,
  title = 	 {Flickering Multi-Armed Bandits},
  author =       {Chakraborty, Sourav and Rege, Amit Kiran and Monteleoni, Claire and Chen, Lijun},
  booktitle = 	 {Proceedings of The 8th Annual Learning for Dynamics and Control Conference},
  pages = 	 {1--70},
  year = 	 {2026},
  editor = 	 {Sukhatme, Gaurav and Lindemann, Lars and Tu, Stephen and Wierman, Adam and Atanasov, Nikolay},
  volume = 	 {331},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {17--19 Jun},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v331/main/assets/chakraborty26a/chakraborty26a.pdf},
  url = 	 {https://proceedings.mlr.press/v331/chakraborty26a.html},
  abstract = 	 {We introduce Flickering Multi-Armed Bandits (FMAB), a framework for sequential decision-making where action availability evolves dynamically and depends on the agent’s current action choice. This model captures environments where movement is restricted to local neighborhoods of a time-varying graph. We analyze the FMAB under two canonical graph processes: i.i.d Erdős–Rényi (ER) evolution and Edge-Markovian dynamics. To address the dual challenges of learning and navigation, we propose a two-phase algorithm that utilizes a lazy random walk for efficient exploration followed by a commitment phase for exploitation. We establish high-probability and expected sublinear regret bounds, demonstrating that the exploration cost is near-optimal via a matching information-theoretic lower bound. Our results highlight the fundamental cost of exploration under local-move constraints. We compliment our theoretical guarantees with numerical simulations, including a disaster-response scenario involving a robotic vehicle navigating unstable communication landscapes.}
}

Endnote

%0 Conference Paper
%T Flickering Multi-Armed Bandits
%A Sourav Chakraborty
%A Amit Kiran Rege
%A Claire Monteleoni
%A Lijun Chen
%B Proceedings of The 8th Annual Learning for Dynamics and Control Conference
%C Proceedings of Machine Learning Research
%D 2026
%E Gaurav Sukhatme
%E Lars Lindemann
%E Stephen Tu
%E Adam Wierman
%E Nikolay Atanasov	
%F pmlr-v331-chakraborty26a
%I PMLR
%P 1--70
%U https://proceedings.mlr.press/v331/chakraborty26a.html
%V 331
%X We introduce Flickering Multi-Armed Bandits (FMAB), a framework for sequential decision-making where action availability evolves dynamically and depends on the agent’s current action choice. This model captures environments where movement is restricted to local neighborhoods of a time-varying graph. We analyze the FMAB under two canonical graph processes: i.i.d Erdős–Rényi (ER) evolution and Edge-Markovian dynamics. To address the dual challenges of learning and navigation, we propose a two-phase algorithm that utilizes a lazy random walk for efficient exploration followed by a commitment phase for exploitation. We establish high-probability and expected sublinear regret bounds, demonstrating that the exploration cost is near-optimal via a matching information-theoretic lower bound. Our results highlight the fundamental cost of exploration under local-move constraints. We compliment our theoretical guarantees with numerical simulations, including a disaster-response scenario involving a robotic vehicle navigating unstable communication landscapes.

APA

Chakraborty, S., Rege, A.K., Monteleoni, C. & Chen, L.. (2026). Flickering Multi-Armed Bandits. Proceedings of The 8th Annual Learning for Dynamics and Control Conference, in Proceedings of Machine Learning Research 331:1-70 Available from https://proceedings.mlr.press/v331/chakraborty26a.html.

Flickering Multi-Armed Bandits

Abstract

Cite this Paper

Related Material