Multi-objective Bandits: Optimizing the Generalized Gini Index

Róbert Busa-Fekete, Balázs Szörényi, Paul Weng, Shie Mannor
Proceedings of the 34th International Conference on Machine Learning, PMLR 70:625-634, 2017.

Abstract

We study the multi-armed bandit (MAB) problem where the agent receives a vectorial feedback that encodes many possibly competing objectives to be optimized. The goal of the agent is to find a policy, which can optimize these objectives simultaneously in a fair way. This multi-objective online optimization problem is formalized by using the Generalized Gini Index (GGI) aggregation function. We propose an online gradient descent algorithm which exploits the convexity of the GGI aggregation function, and controls the exploration in a careful way achieving a distribution-free regret $\tilde{O}(T^{-1/2} )$ with high probability. We test our algorithm on synthetic data as well as on an electric battery control problem where the goal is to trade off the use of the different cells of a battery in order to balance their respective degradation rates.

Cite this Paper


BibTeX
@InProceedings{pmlr-v70-busa-fekete17a, title = {Multi-objective Bandits: Optimizing the Generalized {G}ini Index}, author = {R{\'o}bert Busa-Fekete and Bal{\'a}zs Sz{\"o}r{\'e}nyi and Paul Weng and Shie Mannor}, booktitle = {Proceedings of the 34th International Conference on Machine Learning}, pages = {625--634}, year = {2017}, editor = {Precup, Doina and Teh, Yee Whye}, volume = {70}, series = {Proceedings of Machine Learning Research}, month = {06--11 Aug}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v70/busa-fekete17a/busa-fekete17a.pdf}, url = {https://proceedings.mlr.press/v70/busa-fekete17a.html}, abstract = {We study the multi-armed bandit (MAB) problem where the agent receives a vectorial feedback that encodes many possibly competing objectives to be optimized. The goal of the agent is to find a policy, which can optimize these objectives simultaneously in a fair way. This multi-objective online optimization problem is formalized by using the Generalized Gini Index (GGI) aggregation function. We propose an online gradient descent algorithm which exploits the convexity of the GGI aggregation function, and controls the exploration in a careful way achieving a distribution-free regret $\tilde{O}(T^{-1/2} )$ with high probability. We test our algorithm on synthetic data as well as on an electric battery control problem where the goal is to trade off the use of the different cells of a battery in order to balance their respective degradation rates.} }
Endnote
%0 Conference Paper %T Multi-objective Bandits: Optimizing the Generalized Gini Index %A Róbert Busa-Fekete %A Balázs Szörényi %A Paul Weng %A Shie Mannor %B Proceedings of the 34th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2017 %E Doina Precup %E Yee Whye Teh %F pmlr-v70-busa-fekete17a %I PMLR %P 625--634 %U https://proceedings.mlr.press/v70/busa-fekete17a.html %V 70 %X We study the multi-armed bandit (MAB) problem where the agent receives a vectorial feedback that encodes many possibly competing objectives to be optimized. The goal of the agent is to find a policy, which can optimize these objectives simultaneously in a fair way. This multi-objective online optimization problem is formalized by using the Generalized Gini Index (GGI) aggregation function. We propose an online gradient descent algorithm which exploits the convexity of the GGI aggregation function, and controls the exploration in a careful way achieving a distribution-free regret $\tilde{O}(T^{-1/2} )$ with high probability. We test our algorithm on synthetic data as well as on an electric battery control problem where the goal is to trade off the use of the different cells of a battery in order to balance their respective degradation rates.
APA
Busa-Fekete, R., Szörényi, B., Weng, P. & Mannor, S.. (2017). Multi-objective Bandits: Optimizing the Generalized Gini Index. Proceedings of the 34th International Conference on Machine Learning, in Proceedings of Machine Learning Research 70:625-634 Available from https://proceedings.mlr.press/v70/busa-fekete17a.html.

Related Material