Zeroth Order Non-convex optimization with Dueling-Choice Bandits

Yichong Xu, Aparna Joshi, Aarti Singh, Artur Dubrawski
Proceedings of the 36th Conference on Uncertainty in Artificial Intelligence (UAI), PMLR 124:899-908, 2020.

Abstract

We consider a novel setting of zeroth order non-convex optimization, where in addition to querying the function value at a given point, we can also duel two points and get the point with the larger function value. We refer to this setting as optimization with dueling-choice bandits, since both direct queries and duels are available for optimization. We give the COMP-GP-UCB algorithm based on GP-UCB (Srinivas et al., 2009),, where instead of directly querying the point with the maximum Upper Confidence Bound (UCB), we perform constrained optimization and use comparisons to filter out suboptimal points. COMP-GP-UCB comes with theoretical guarantee of $O(\frac{\Phi}{\sqrt{T}})$ on simple regret where $T$ is the number of direct queries and $\Phi$ is an improved information gain stemming from a comparison-based constraint set that restricts the space for optimum search. In contrast, in the plain direct query setting, $\Phi$ depends on the entire domain. We discuss theoretical aspects and show experimental results to demonstrate efficacy of our algorithm.

Cite this Paper


BibTeX
@InProceedings{pmlr-v124-xu20b, title = {Zeroth Order Non-convex optimization with Dueling-Choice Bandits}, author = {Xu, Yichong and Joshi, Aparna and Singh, Aarti and Dubrawski, Artur}, booktitle = {Proceedings of the 36th Conference on Uncertainty in Artificial Intelligence (UAI)}, pages = {899--908}, year = {2020}, editor = {Peters, Jonas and Sontag, David}, volume = {124}, series = {Proceedings of Machine Learning Research}, month = {03--06 Aug}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v124/xu20b/xu20b.pdf}, url = {https://proceedings.mlr.press/v124/xu20b.html}, abstract = {We consider a novel setting of zeroth order non-convex optimization, where in addition to querying the function value at a given point, we can also duel two points and get the point with the larger function value. We refer to this setting as optimization with dueling-choice bandits, since both direct queries and duels are available for optimization. We give the COMP-GP-UCB algorithm based on GP-UCB (Srinivas et al., 2009),, where instead of directly querying the point with the maximum Upper Confidence Bound (UCB), we perform constrained optimization and use comparisons to filter out suboptimal points. COMP-GP-UCB comes with theoretical guarantee of $O(\frac{\Phi}{\sqrt{T}})$ on simple regret where $T$ is the number of direct queries and $\Phi$ is an improved information gain stemming from a comparison-based constraint set that restricts the space for optimum search. In contrast, in the plain direct query setting, $\Phi$ depends on the entire domain. We discuss theoretical aspects and show experimental results to demonstrate efficacy of our algorithm.} }
Endnote
%0 Conference Paper %T Zeroth Order Non-convex optimization with Dueling-Choice Bandits %A Yichong Xu %A Aparna Joshi %A Aarti Singh %A Artur Dubrawski %B Proceedings of the 36th Conference on Uncertainty in Artificial Intelligence (UAI) %C Proceedings of Machine Learning Research %D 2020 %E Jonas Peters %E David Sontag %F pmlr-v124-xu20b %I PMLR %P 899--908 %U https://proceedings.mlr.press/v124/xu20b.html %V 124 %X We consider a novel setting of zeroth order non-convex optimization, where in addition to querying the function value at a given point, we can also duel two points and get the point with the larger function value. We refer to this setting as optimization with dueling-choice bandits, since both direct queries and duels are available for optimization. We give the COMP-GP-UCB algorithm based on GP-UCB (Srinivas et al., 2009),, where instead of directly querying the point with the maximum Upper Confidence Bound (UCB), we perform constrained optimization and use comparisons to filter out suboptimal points. COMP-GP-UCB comes with theoretical guarantee of $O(\frac{\Phi}{\sqrt{T}})$ on simple regret where $T$ is the number of direct queries and $\Phi$ is an improved information gain stemming from a comparison-based constraint set that restricts the space for optimum search. In contrast, in the plain direct query setting, $\Phi$ depends on the entire domain. We discuss theoretical aspects and show experimental results to demonstrate efficacy of our algorithm.
APA
Xu, Y., Joshi, A., Singh, A. & Dubrawski, A.. (2020). Zeroth Order Non-convex optimization with Dueling-Choice Bandits. Proceedings of the 36th Conference on Uncertainty in Artificial Intelligence (UAI), in Proceedings of Machine Learning Research 124:899-908 Available from https://proceedings.mlr.press/v124/xu20b.html.

Related Material