On Preference-based Stochastic Linear Contextual Bandits with Knapsacks

Xin Liu
Proceedings of The 28th International Conference on Artificial Intelligence and Statistics, PMLR 258:2890-2898, 2025.

Abstract

This paper studies the problem of preference-based stochastic linear contextual bandits with knapsack constraints (PbLinCBwK). We propose budget-aware optimistic and randomized exploration algorithms that achieve a regret of ${O}((\kappa+\frac{T\nu^*}{B})\sqrt{T}\log T),$ for any total budget $B=\Omega(\sqrt{T}).$ The parameters $\kappa$ and $\frac{T\nu^*}{B}$ capture the effects of preference feedback and knapsack constraints, respectively. Our regret performance is near-optimal and matches the bound of LinCBwK under the mild condition $B=\Omega(\sqrt{T}).$ To achieve these results, we view the process of budget consumption and stopping time as Markov processing and analyze it via the Lyapunov drift method, which is translated into the strong regret guarantee. The experiments on synthetic PbLinCBwK and online content moderation setting further justify the theoretical results.

Cite this Paper


BibTeX
@InProceedings{pmlr-v258-liu25g, title = {On Preference-based Stochastic Linear Contextual Bandits with Knapsacks}, author = {Liu, Xin}, booktitle = {Proceedings of The 28th International Conference on Artificial Intelligence and Statistics}, pages = {2890--2898}, year = {2025}, editor = {Li, Yingzhen and Mandt, Stephan and Agrawal, Shipra and Khan, Emtiyaz}, volume = {258}, series = {Proceedings of Machine Learning Research}, month = {03--05 May}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v258/main/assets/liu25g/liu25g.pdf}, url = {https://proceedings.mlr.press/v258/liu25g.html}, abstract = {This paper studies the problem of preference-based stochastic linear contextual bandits with knapsack constraints (PbLinCBwK). We propose budget-aware optimistic and randomized exploration algorithms that achieve a regret of ${O}((\kappa+\frac{T\nu^*}{B})\sqrt{T}\log T),$ for any total budget $B=\Omega(\sqrt{T}).$ The parameters $\kappa$ and $\frac{T\nu^*}{B}$ capture the effects of preference feedback and knapsack constraints, respectively. Our regret performance is near-optimal and matches the bound of LinCBwK under the mild condition $B=\Omega(\sqrt{T}).$ To achieve these results, we view the process of budget consumption and stopping time as Markov processing and analyze it via the Lyapunov drift method, which is translated into the strong regret guarantee. The experiments on synthetic PbLinCBwK and online content moderation setting further justify the theoretical results.} }
Endnote
%0 Conference Paper %T On Preference-based Stochastic Linear Contextual Bandits with Knapsacks %A Xin Liu %B Proceedings of The 28th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2025 %E Yingzhen Li %E Stephan Mandt %E Shipra Agrawal %E Emtiyaz Khan %F pmlr-v258-liu25g %I PMLR %P 2890--2898 %U https://proceedings.mlr.press/v258/liu25g.html %V 258 %X This paper studies the problem of preference-based stochastic linear contextual bandits with knapsack constraints (PbLinCBwK). We propose budget-aware optimistic and randomized exploration algorithms that achieve a regret of ${O}((\kappa+\frac{T\nu^*}{B})\sqrt{T}\log T),$ for any total budget $B=\Omega(\sqrt{T}).$ The parameters $\kappa$ and $\frac{T\nu^*}{B}$ capture the effects of preference feedback and knapsack constraints, respectively. Our regret performance is near-optimal and matches the bound of LinCBwK under the mild condition $B=\Omega(\sqrt{T}).$ To achieve these results, we view the process of budget consumption and stopping time as Markov processing and analyze it via the Lyapunov drift method, which is translated into the strong regret guarantee. The experiments on synthetic PbLinCBwK and online content moderation setting further justify the theoretical results.
APA
Liu, X.. (2025). On Preference-based Stochastic Linear Contextual Bandits with Knapsacks. Proceedings of The 28th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 258:2890-2898 Available from https://proceedings.mlr.press/v258/liu25g.html.

Related Material