Thresholded linear bandits

Nishant A. Mehta; Junpei Komiyama; Vamsi K. Potluru; Andrea Nguyen; Mica Grant-Hagen

Thresholded linear bandits

Nishant A. Mehta, Junpei Komiyama, Vamsi K. Potluru, Andrea Nguyen, Mica Grant-Hagen

Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, PMLR 206:6968-7020, 2023.

Abstract

We introduce the thresholded linear bandit problem, a novel sequential decision making problem at the interface of structured stochastic multi-armed bandits and learning halfspaces. The set of arms is

$[0, 1]^d$ , the expected Bernoulli reward is piecewise constant with a jump at a separating hyperplane, and each arm is associated with a cost that is a positive linear combination of the arm’s components. This problem is motivated by several practical applications. For instance, imagine tuning the continuous features of an offer to a consumer; higher values incur higher cost to the vendor but result in a more attractive offer. At some threshold, the offer is attractive enough for a random consumer to accept at the higher probability level. For the one-dimensional case, we present Leftist, which enjoys

$\log^2 T$ problem-dependent regret in favorable cases and has

$\log(T) \sqrt{T}$ worst-case regret; we also give a lower bound suggesting this is unimprovable. We then present MD-Leftist, our extension of Leftist to the multi-dimensional case, which obtains similar regret bounds but with

$d^{2.5} \log d$ and

$d^{1.5} \log d$ dependence on dimension for the two types of bounds respectively. Finally, we experimentally evaluate Leftist.

Cite this Paper

BibTeX


@InProceedings{pmlr-v206-mehta23a,
  title = 	 {Thresholded linear bandits},
  author =       {Mehta, Nishant A. and Komiyama, Junpei and Potluru, Vamsi K. and Nguyen, Andrea and Grant-Hagen, Mica},
  booktitle = 	 {Proceedings of The 26th International Conference on Artificial Intelligence and Statistics},
  pages = 	 {6968--7020},
  year = 	 {2023},
  editor = 	 {Ruiz, Francisco and Dy, Jennifer and van de Meent, Jan-Willem},
  volume = 	 {206},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {25--27 Apr},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v206/mehta23a/mehta23a.pdf},
  url = 	 {https://proceedings.mlr.press/v206/mehta23a.html},
  abstract = 	 {We introduce the thresholded linear bandit problem, a novel sequential decision making problem at the interface of structured stochastic multi-armed bandits and learning halfspaces. The set of arms is $[0, 1]^d$, the expected Bernoulli reward is piecewise constant with a jump at a separating hyperplane, and each arm is associated with a cost that is a positive linear combination of the arm’s components. This problem is motivated by several practical applications. For instance, imagine tuning the continuous features of an offer to a consumer; higher values incur higher cost to the vendor but result in a more attractive offer. At some threshold, the offer is attractive enough for a random consumer to accept at the higher probability level. For the one-dimensional case, we present Leftist, which enjoys $\log^2 T$ problem-dependent regret in favorable cases and has $\log(T) \sqrt{T}$ worst-case regret; we also give a lower bound suggesting this is unimprovable. We then present MD-Leftist, our extension of Leftist to the multi-dimensional case, which obtains similar regret bounds but with $d^{2.5} \log d$ and $d^{1.5} \log d$ dependence on dimension for the two types of bounds respectively. Finally, we experimentally evaluate Leftist.}
}

Endnote

%0 Conference Paper
%T Thresholded linear bandits
%A Nishant A. Mehta
%A Junpei Komiyama
%A Vamsi K. Potluru
%A Andrea Nguyen
%A Mica Grant-Hagen
%B Proceedings of The 26th International Conference on Artificial Intelligence and Statistics
%C Proceedings of Machine Learning Research
%D 2023
%E Francisco Ruiz
%E Jennifer Dy
%E Jan-Willem van de Meent	
%F pmlr-v206-mehta23a
%I PMLR
%P 6968--7020
%U https://proceedings.mlr.press/v206/mehta23a.html
%V 206
%X We introduce the thresholded linear bandit problem, a novel sequential decision making problem at the interface of structured stochastic multi-armed bandits and learning halfspaces. The set of arms is $[0, 1]^d$, the expected Bernoulli reward is piecewise constant with a jump at a separating hyperplane, and each arm is associated with a cost that is a positive linear combination of the arm’s components. This problem is motivated by several practical applications. For instance, imagine tuning the continuous features of an offer to a consumer; higher values incur higher cost to the vendor but result in a more attractive offer. At some threshold, the offer is attractive enough for a random consumer to accept at the higher probability level. For the one-dimensional case, we present Leftist, which enjoys $\log^2 T$ problem-dependent regret in favorable cases and has $\log(T) \sqrt{T}$ worst-case regret; we also give a lower bound suggesting this is unimprovable. We then present MD-Leftist, our extension of Leftist to the multi-dimensional case, which obtains similar regret bounds but with $d^{2.5} \log d$ and $d^{1.5} \log d$ dependence on dimension for the two types of bounds respectively. Finally, we experimentally evaluate Leftist.

APA


Mehta, N.A., Komiyama, J., Potluru, V.K., Nguyen, A. & Grant-Hagen, M.. (2023). Thresholded linear bandits. Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 206:6968-7020 Available from https://proceedings.mlr.press/v206/mehta23a.html.

Thresholded linear bandits

Abstract

Cite this Paper

Related Material