Flickering Multi-Armed Bandits

Sourav Chakraborty, Amit Kiran Rege, Claire Monteleoni, Lijun Chen
Proceedings of The 8th Annual Learning for Dynamics and Control Conference, PMLR 331:1-70, 2026.

Abstract

We introduce Flickering Multi-Armed Bandits (FMAB), a framework for sequential decision-making where action availability evolves dynamically and depends on the agent’s current action choice. This model captures environments where movement is restricted to local neighborhoods of a time-varying graph. We analyze the FMAB under two canonical graph processes: i.i.d Erdős–Rényi (ER) evolution and Edge-Markovian dynamics. To address the dual challenges of learning and navigation, we propose a two-phase algorithm that utilizes a lazy random walk for efficient exploration followed by a commitment phase for exploitation. We establish high-probability and expected sublinear regret bounds, demonstrating that the exploration cost is near-optimal via a matching information-theoretic lower bound. Our results highlight the fundamental cost of exploration under local-move constraints. We compliment our theoretical guarantees with numerical simulations, including a disaster-response scenario involving a robotic vehicle navigating unstable communication landscapes.

Cite this Paper


BibTeX
@InProceedings{pmlr-v331-chakraborty26a, title = {Flickering Multi-Armed Bandits}, author = {Chakraborty, Sourav and Rege, Amit Kiran and Monteleoni, Claire and Chen, Lijun}, booktitle = {Proceedings of The 8th Annual Learning for Dynamics and Control Conference}, pages = {1--70}, year = {2026}, editor = {Sukhatme, Gaurav and Lindemann, Lars and Tu, Stephen and Wierman, Adam and Atanasov, Nikolay}, volume = {331}, series = {Proceedings of Machine Learning Research}, month = {17--19 Jun}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v331/main/assets/chakraborty26a/chakraborty26a.pdf}, url = {https://proceedings.mlr.press/v331/chakraborty26a.html}, abstract = {We introduce Flickering Multi-Armed Bandits (FMAB), a framework for sequential decision-making where action availability evolves dynamically and depends on the agent’s current action choice. This model captures environments where movement is restricted to local neighborhoods of a time-varying graph. We analyze the FMAB under two canonical graph processes: i.i.d Erdős–Rényi (ER) evolution and Edge-Markovian dynamics. To address the dual challenges of learning and navigation, we propose a two-phase algorithm that utilizes a lazy random walk for efficient exploration followed by a commitment phase for exploitation. We establish high-probability and expected sublinear regret bounds, demonstrating that the exploration cost is near-optimal via a matching information-theoretic lower bound. Our results highlight the fundamental cost of exploration under local-move constraints. We compliment our theoretical guarantees with numerical simulations, including a disaster-response scenario involving a robotic vehicle navigating unstable communication landscapes.} }
Endnote
%0 Conference Paper %T Flickering Multi-Armed Bandits %A Sourav Chakraborty %A Amit Kiran Rege %A Claire Monteleoni %A Lijun Chen %B Proceedings of The 8th Annual Learning for Dynamics and Control Conference %C Proceedings of Machine Learning Research %D 2026 %E Gaurav Sukhatme %E Lars Lindemann %E Stephen Tu %E Adam Wierman %E Nikolay Atanasov %F pmlr-v331-chakraborty26a %I PMLR %P 1--70 %U https://proceedings.mlr.press/v331/chakraborty26a.html %V 331 %X We introduce Flickering Multi-Armed Bandits (FMAB), a framework for sequential decision-making where action availability evolves dynamically and depends on the agent’s current action choice. This model captures environments where movement is restricted to local neighborhoods of a time-varying graph. We analyze the FMAB under two canonical graph processes: i.i.d Erdős–Rényi (ER) evolution and Edge-Markovian dynamics. To address the dual challenges of learning and navigation, we propose a two-phase algorithm that utilizes a lazy random walk for efficient exploration followed by a commitment phase for exploitation. We establish high-probability and expected sublinear regret bounds, demonstrating that the exploration cost is near-optimal via a matching information-theoretic lower bound. Our results highlight the fundamental cost of exploration under local-move constraints. We compliment our theoretical guarantees with numerical simulations, including a disaster-response scenario involving a robotic vehicle navigating unstable communication landscapes.
APA
Chakraborty, S., Rege, A.K., Monteleoni, C. & Chen, L.. (2026). Flickering Multi-Armed Bandits. Proceedings of The 8th Annual Learning for Dynamics and Control Conference, in Proceedings of Machine Learning Research 331:1-70 Available from https://proceedings.mlr.press/v331/chakraborty26a.html.

Related Material