Thresholded linear bandits

Nishant A. Mehta, Junpei Komiyama, Vamsi K. Potluru, Andrea Nguyen, Mica Grant-Hagen
Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, PMLR 206:6968-7020, 2023.

Abstract

We introduce the thresholded linear bandit problem, a novel sequential decision making problem at the interface of structured stochastic multi-armed bandits and learning halfspaces. The set of arms is [0,1]d, the expected Bernoulli reward is piecewise constant with a jump at a separating hyperplane, and each arm is associated with a cost that is a positive linear combination of the arm’s components. This problem is motivated by several practical applications. For instance, imagine tuning the continuous features of an offer to a consumer; higher values incur higher cost to the vendor but result in a more attractive offer. At some threshold, the offer is attractive enough for a random consumer to accept at the higher probability level. For the one-dimensional case, we present Leftist, which enjoys log2T problem-dependent regret in favorable cases and has log(T)T worst-case regret; we also give a lower bound suggesting this is unimprovable. We then present MD-Leftist, our extension of Leftist to the multi-dimensional case, which obtains similar regret bounds but with d2.5logd and d1.5logd dependence on dimension for the two types of bounds respectively. Finally, we experimentally evaluate Leftist.

Cite this Paper


BibTeX
@InProceedings{pmlr-v206-mehta23a, title = {Thresholded linear bandits}, author = {Mehta, Nishant A. and Komiyama, Junpei and Potluru, Vamsi K. and Nguyen, Andrea and Grant-Hagen, Mica}, booktitle = {Proceedings of The 26th International Conference on Artificial Intelligence and Statistics}, pages = {6968--7020}, year = {2023}, editor = {Ruiz, Francisco and Dy, Jennifer and van de Meent, Jan-Willem}, volume = {206}, series = {Proceedings of Machine Learning Research}, month = {25--27 Apr}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v206/mehta23a/mehta23a.pdf}, url = {https://proceedings.mlr.press/v206/mehta23a.html}, abstract = {We introduce the thresholded linear bandit problem, a novel sequential decision making problem at the interface of structured stochastic multi-armed bandits and learning halfspaces. The set of arms is $[0, 1]^d$, the expected Bernoulli reward is piecewise constant with a jump at a separating hyperplane, and each arm is associated with a cost that is a positive linear combination of the arm’s components. This problem is motivated by several practical applications. For instance, imagine tuning the continuous features of an offer to a consumer; higher values incur higher cost to the vendor but result in a more attractive offer. At some threshold, the offer is attractive enough for a random consumer to accept at the higher probability level. For the one-dimensional case, we present Leftist, which enjoys $\log^2 T$ problem-dependent regret in favorable cases and has $\log(T) \sqrt{T}$ worst-case regret; we also give a lower bound suggesting this is unimprovable. We then present MD-Leftist, our extension of Leftist to the multi-dimensional case, which obtains similar regret bounds but with $d^{2.5} \log d$ and $d^{1.5} \log d$ dependence on dimension for the two types of bounds respectively. Finally, we experimentally evaluate Leftist.} }
Endnote
%0 Conference Paper %T Thresholded linear bandits %A Nishant A. Mehta %A Junpei Komiyama %A Vamsi K. Potluru %A Andrea Nguyen %A Mica Grant-Hagen %B Proceedings of The 26th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2023 %E Francisco Ruiz %E Jennifer Dy %E Jan-Willem van de Meent %F pmlr-v206-mehta23a %I PMLR %P 6968--7020 %U https://proceedings.mlr.press/v206/mehta23a.html %V 206 %X We introduce the thresholded linear bandit problem, a novel sequential decision making problem at the interface of structured stochastic multi-armed bandits and learning halfspaces. The set of arms is $[0, 1]^d$, the expected Bernoulli reward is piecewise constant with a jump at a separating hyperplane, and each arm is associated with a cost that is a positive linear combination of the arm’s components. This problem is motivated by several practical applications. For instance, imagine tuning the continuous features of an offer to a consumer; higher values incur higher cost to the vendor but result in a more attractive offer. At some threshold, the offer is attractive enough for a random consumer to accept at the higher probability level. For the one-dimensional case, we present Leftist, which enjoys $\log^2 T$ problem-dependent regret in favorable cases and has $\log(T) \sqrt{T}$ worst-case regret; we also give a lower bound suggesting this is unimprovable. We then present MD-Leftist, our extension of Leftist to the multi-dimensional case, which obtains similar regret bounds but with $d^{2.5} \log d$ and $d^{1.5} \log d$ dependence on dimension for the two types of bounds respectively. Finally, we experimentally evaluate Leftist.
APA
Mehta, N.A., Komiyama, J., Potluru, V.K., Nguyen, A. & Grant-Hagen, M.. (2023). Thresholded linear bandits. Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 206:6968-7020 Available from https://proceedings.mlr.press/v206/mehta23a.html.

Related Material