Vector Optimization with Stochastic Bandit Feedback

Cagin Ararat; Cem Tekin

Vector Optimization with Stochastic Bandit Feedback

Cagin Ararat, Cem Tekin

Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, PMLR 206:2165-2190, 2023.

Abstract

We introduce vector optimization problems with stochastic bandit feedback, in which preferences among designs are encoded by a polyhedral ordering cone

$C$ . Our setup generalizes the best arm identification problem to vector-valued rewards by extending the concept of Pareto set beyond multi-objective optimization. We characterize the sample complexity of (

$\epsilon,\delta$ )-PAC Pareto set identification by defining a new cone-dependent notion of complexity, called the ordering complexity. In particular, we provide gap-dependent and worst-case lower bounds on the sample complexity and show that, in the worst-case, the sample complexity scales with the square of ordering complexity. Furthermore, we investigate the sample complexity of the na{ı̈}ve elimination algorithm and prove that it nearly matches the worst-case sample complexity. Finally, we run experiments to verify our theoretical results and illustrate how

$C$ and sampling budget affect the Pareto set, the returned (

$\epsilon,\delta$ )-PAC Pareto set, and the success of identification.

Cite this Paper

BibTeX


@InProceedings{pmlr-v206-ararat23a,
  title = 	 {Vector Optimization with Stochastic Bandit Feedback},
  author =       {Ararat, Cagin and Tekin, Cem},
  booktitle = 	 {Proceedings of The 26th International Conference on Artificial Intelligence and Statistics},
  pages = 	 {2165--2190},
  year = 	 {2023},
  editor = 	 {Ruiz, Francisco and Dy, Jennifer and van de Meent, Jan-Willem},
  volume = 	 {206},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {25--27 Apr},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v206/ararat23a/ararat23a.pdf},
  url = 	 {https://proceedings.mlr.press/v206/ararat23a.html},
  abstract = 	 {We introduce vector optimization problems with stochastic bandit feedback, in which preferences among designs are encoded by a polyhedral ordering cone $C$. Our setup generalizes the best arm identification problem to vector-valued rewards by extending the concept of Pareto set beyond multi-objective optimization. We characterize the sample complexity of ($\epsilon,\delta$)-PAC Pareto set identification by defining a new cone-dependent notion of complexity, called the ordering complexity. In particular, we provide gap-dependent and worst-case lower bounds on the sample complexity and show that, in the worst-case, the sample complexity scales with the square of ordering complexity. Furthermore, we investigate the sample complexity of the na{ı̈}ve elimination algorithm and prove that it nearly matches the worst-case sample complexity. Finally, we run experiments to verify our theoretical results and illustrate how $C$ and sampling budget affect the Pareto set, the returned ($\epsilon,\delta$)-PAC Pareto set, and the success of identification.}
}

Endnote

%0 Conference Paper
%T Vector Optimization with Stochastic Bandit Feedback
%A Cagin Ararat
%A Cem Tekin
%B Proceedings of The 26th International Conference on Artificial Intelligence and Statistics
%C Proceedings of Machine Learning Research
%D 2023
%E Francisco Ruiz
%E Jennifer Dy
%E Jan-Willem van de Meent	
%F pmlr-v206-ararat23a
%I PMLR
%P 2165--2190
%U https://proceedings.mlr.press/v206/ararat23a.html
%V 206
%X We introduce vector optimization problems with stochastic bandit feedback, in which preferences among designs are encoded by a polyhedral ordering cone $C$. Our setup generalizes the best arm identification problem to vector-valued rewards by extending the concept of Pareto set beyond multi-objective optimization. We characterize the sample complexity of ($\epsilon,\delta$)-PAC Pareto set identification by defining a new cone-dependent notion of complexity, called the ordering complexity. In particular, we provide gap-dependent and worst-case lower bounds on the sample complexity and show that, in the worst-case, the sample complexity scales with the square of ordering complexity. Furthermore, we investigate the sample complexity of the na{ı̈}ve elimination algorithm and prove that it nearly matches the worst-case sample complexity. Finally, we run experiments to verify our theoretical results and illustrate how $C$ and sampling budget affect the Pareto set, the returned ($\epsilon,\delta$)-PAC Pareto set, and the success of identification.

APA


Ararat, C. & Tekin, C.. (2023). Vector Optimization with Stochastic Bandit Feedback. Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 206:2165-2190 Available from https://proceedings.mlr.press/v206/ararat23a.html.

Vector Optimization with Stochastic Bandit Feedback

Abstract

Cite this Paper

Related Material