[edit]
Flickering Multi-Armed Bandits
Proceedings of The 8th Annual Learning for Dynamics and Control Conference, PMLR 331:1-70, 2026.
Abstract
We introduce Flickering Multi-Armed Bandits (FMAB), a framework for sequential decision-making where action availability evolves dynamically and depends on the agent’s current action choice. This model captures environments where movement is restricted to local neighborhoods of a time-varying graph. We analyze the FMAB under two canonical graph processes: i.i.d Erdős–Rényi (ER) evolution and Edge-Markovian dynamics. To address the dual challenges of learning and navigation, we propose a two-phase algorithm that utilizes a lazy random walk for efficient exploration followed by a commitment phase for exploitation. We establish high-probability and expected sublinear regret bounds, demonstrating that the exploration cost is near-optimal via a matching information-theoretic lower bound. Our results highlight the fundamental cost of exploration under local-move constraints. We compliment our theoretical guarantees with numerical simulations, including a disaster-response scenario involving a robotic vehicle navigating unstable communication landscapes.