Approximate information maximization for bandit games

Alex Barbier Chebbah, Christian L. Vestergaard, Jean-Baptiste Masson, Etienne Boursier
Proceedings of The 28th International Conference on Artificial Intelligence and Statistics, PMLR 258:316-324, 2025.

Abstract

Entropy maximization and free energy minimization are general physics principles for modeling dynamic systems. Notable examples include modeling decision-making within the brain using the free-energy principle, optimizing the accuracy-complexity trade-off when accessing hidden variables with the information bottleneck principle (Tishby et al. 2000), and navigation in random environments using information maximization (Vergassola et al. 2007). Building on this principle, we propose a new class of bandit algorithms that maximize an approximation to the information of a key variable within the system. To this end, we develop an approximated, analytical physics-based representation of the entropy to forecast the information gain of each action and greedily choose the one with the largest information gain. This method yields strong performances in classical bandit settings. Motivated by its empirical success, we prove its asymptotic optimality for the multi-armed bandit problem with Gaussian rewards. Since it encompasses the system’s properties in a single, global functional, this approach can be efficiently adapted to more complex bandit settings. This calls for further investigation of information maximization approaches for multi-armed bandit problems.

Cite this Paper


BibTeX
@InProceedings{pmlr-v258-chebbah25a, title = {Approximate information maximization for bandit games}, author = {Chebbah, Alex Barbier and Vestergaard, Christian L. and Masson, Jean-Baptiste and Boursier, Etienne}, booktitle = {Proceedings of The 28th International Conference on Artificial Intelligence and Statistics}, pages = {316--324}, year = {2025}, editor = {Li, Yingzhen and Mandt, Stephan and Agrawal, Shipra and Khan, Emtiyaz}, volume = {258}, series = {Proceedings of Machine Learning Research}, month = {03--05 May}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v258/main/assets/chebbah25a/chebbah25a.pdf}, url = {https://proceedings.mlr.press/v258/chebbah25a.html}, abstract = {Entropy maximization and free energy minimization are general physics principles for modeling dynamic systems. Notable examples include modeling decision-making within the brain using the free-energy principle, optimizing the accuracy-complexity trade-off when accessing hidden variables with the information bottleneck principle (Tishby et al. 2000), and navigation in random environments using information maximization (Vergassola et al. 2007). Building on this principle, we propose a new class of bandit algorithms that maximize an approximation to the information of a key variable within the system. To this end, we develop an approximated, analytical physics-based representation of the entropy to forecast the information gain of each action and greedily choose the one with the largest information gain. This method yields strong performances in classical bandit settings. Motivated by its empirical success, we prove its asymptotic optimality for the multi-armed bandit problem with Gaussian rewards. Since it encompasses the system’s properties in a single, global functional, this approach can be efficiently adapted to more complex bandit settings. This calls for further investigation of information maximization approaches for multi-armed bandit problems.} }
Endnote
%0 Conference Paper %T Approximate information maximization for bandit games %A Alex Barbier Chebbah %A Christian L. Vestergaard %A Jean-Baptiste Masson %A Etienne Boursier %B Proceedings of The 28th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2025 %E Yingzhen Li %E Stephan Mandt %E Shipra Agrawal %E Emtiyaz Khan %F pmlr-v258-chebbah25a %I PMLR %P 316--324 %U https://proceedings.mlr.press/v258/chebbah25a.html %V 258 %X Entropy maximization and free energy minimization are general physics principles for modeling dynamic systems. Notable examples include modeling decision-making within the brain using the free-energy principle, optimizing the accuracy-complexity trade-off when accessing hidden variables with the information bottleneck principle (Tishby et al. 2000), and navigation in random environments using information maximization (Vergassola et al. 2007). Building on this principle, we propose a new class of bandit algorithms that maximize an approximation to the information of a key variable within the system. To this end, we develop an approximated, analytical physics-based representation of the entropy to forecast the information gain of each action and greedily choose the one with the largest information gain. This method yields strong performances in classical bandit settings. Motivated by its empirical success, we prove its asymptotic optimality for the multi-armed bandit problem with Gaussian rewards. Since it encompasses the system’s properties in a single, global functional, this approach can be efficiently adapted to more complex bandit settings. This calls for further investigation of information maximization approaches for multi-armed bandit problems.
APA
Chebbah, A.B., Vestergaard, C.L., Masson, J. & Boursier, E.. (2025). Approximate information maximization for bandit games. Proceedings of The 28th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 258:316-324 Available from https://proceedings.mlr.press/v258/chebbah25a.html.

Related Material