Online Batch Decision-Making with High-Dimensional Covariates

Chi-Hua Wang, Guang Cheng
Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, PMLR 108:3848-3857, 2020.

Abstract

We propose and investigate a class of new algorithms for sequential decision making that interacts with a batch of users simultaneously instead of a user at each decision epoch. This type of batch models is motivated by interactive marketing and clinical trial, where a group of people are treated simultaneously and the outcomes of the whole group are collected before the next stage of decision. In such a scenario, our goal is to allocate a batch of treatments to maximize treatment efficacy based on observed high-dimensional user covariates. We deliver a solution, named Teamwork LASSO Bandit algorithm, that resolves a batch version of explore-exploit dilemma via switching between teamwork stage and selfish stage during the whole decision process. This is made possible based on statistical properties of LASSO estimate of treatment efficacy that adapts to a sequence of batch observations. In general, a rate of optimal allocation condition is proposed to delineate the exploration and exploitation trade-off on the data collection scheme, which is sufficient for LASSO to identify the optimal treatment for observed user covariates. An upper bound on expected cumulative regret of the proposed algorithm is provided.

Cite this Paper


BibTeX
@InProceedings{pmlr-v108-wang20k, title = {Online Batch Decision-Making with High-Dimensional Covariates}, author = {Wang, Chi-Hua and Cheng, Guang}, booktitle = {Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics}, pages = {3848--3857}, year = {2020}, editor = {Chiappa, Silvia and Calandra, Roberto}, volume = {108}, series = {Proceedings of Machine Learning Research}, month = {26--28 Aug}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v108/wang20k/wang20k.pdf}, url = {https://proceedings.mlr.press/v108/wang20k.html}, abstract = {We propose and investigate a class of new algorithms for sequential decision making that interacts with a batch of users simultaneously instead of a user at each decision epoch. This type of batch models is motivated by interactive marketing and clinical trial, where a group of people are treated simultaneously and the outcomes of the whole group are collected before the next stage of decision. In such a scenario, our goal is to allocate a batch of treatments to maximize treatment efficacy based on observed high-dimensional user covariates. We deliver a solution, named Teamwork LASSO Bandit algorithm, that resolves a batch version of explore-exploit dilemma via switching between teamwork stage and selfish stage during the whole decision process. This is made possible based on statistical properties of LASSO estimate of treatment efficacy that adapts to a sequence of batch observations. In general, a rate of optimal allocation condition is proposed to delineate the exploration and exploitation trade-off on the data collection scheme, which is sufficient for LASSO to identify the optimal treatment for observed user covariates. An upper bound on expected cumulative regret of the proposed algorithm is provided.} }
Endnote
%0 Conference Paper %T Online Batch Decision-Making with High-Dimensional Covariates %A Chi-Hua Wang %A Guang Cheng %B Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2020 %E Silvia Chiappa %E Roberto Calandra %F pmlr-v108-wang20k %I PMLR %P 3848--3857 %U https://proceedings.mlr.press/v108/wang20k.html %V 108 %X We propose and investigate a class of new algorithms for sequential decision making that interacts with a batch of users simultaneously instead of a user at each decision epoch. This type of batch models is motivated by interactive marketing and clinical trial, where a group of people are treated simultaneously and the outcomes of the whole group are collected before the next stage of decision. In such a scenario, our goal is to allocate a batch of treatments to maximize treatment efficacy based on observed high-dimensional user covariates. We deliver a solution, named Teamwork LASSO Bandit algorithm, that resolves a batch version of explore-exploit dilemma via switching between teamwork stage and selfish stage during the whole decision process. This is made possible based on statistical properties of LASSO estimate of treatment efficacy that adapts to a sequence of batch observations. In general, a rate of optimal allocation condition is proposed to delineate the exploration and exploitation trade-off on the data collection scheme, which is sufficient for LASSO to identify the optimal treatment for observed user covariates. An upper bound on expected cumulative regret of the proposed algorithm is provided.
APA
Wang, C. & Cheng, G.. (2020). Online Batch Decision-Making with High-Dimensional Covariates. Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 108:3848-3857 Available from https://proceedings.mlr.press/v108/wang20k.html.

Related Material