Minimax Concave Penalized Multi-Armed Bandit Model with High-Dimensional Covariates

Xue Wang; Mingcheng Wei; Tao Yao

Minimax Concave Penalized Multi-Armed Bandit Model with High-Dimensional Covariates

Xue Wang, Mingcheng Wei, Tao Yao

Proceedings of the 35th International Conference on Machine Learning, PMLR 80:5200-5208, 2018.

Abstract

In this paper, we propose a Minimax Concave Penalized Multi-Armed Bandit (MCP-Bandit) algorithm for a decision-maker facing high-dimensional data with latent sparse structure in an online learning and decision-making process. We demonstrate that the MCP-Bandit algorithm asymptotically achieves the optimal cumulative regret in sample size T, O(log T), and further attains a tighter bound in both covariates dimension d and the number of significant covariates s, O(s^2 (s + log d). In addition, we develop a linear approximation method, the 2-step Weighted Lasso procedure, to identify the MCP estimator for the MCP-Bandit algorithm under non-i.i.d. samples. Using this procedure, the MCP estimator matches the oracle estimator with high probability. Finally, we present two experiments to benchmark our proposed the MCP-Bandit algorithm to other bandit algorithms. Both experiments demonstrate that the MCP-Bandit algorithm performs favorably over other benchmark algorithms, especially when there is a high level of data sparsity or when the sample size is not too small.

Cite this Paper

BibTeX

@InProceedings{pmlr-v80-wang18j,
  title = 	 {Minimax Concave Penalized Multi-Armed Bandit Model with High-Dimensional Covariates},
  author =       {Wang, Xue and Wei, Mingcheng and Yao, Tao},
  booktitle = 	 {Proceedings of the 35th International Conference on Machine Learning},
  pages = 	 {5200--5208},
  year = 	 {2018},
  editor = 	 {Dy, Jennifer and Krause, Andreas},
  volume = 	 {80},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {10--15 Jul},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v80/wang18j/wang18j.pdf},
  url = 	 {https://proceedings.mlr.press/v80/wang18j.html},
  abstract = 	 {In this paper, we propose a Minimax Concave Penalized Multi-Armed Bandit (MCP-Bandit) algorithm for a decision-maker facing high-dimensional data with latent sparse structure in an online learning and decision-making process. We demonstrate that the MCP-Bandit algorithm asymptotically achieves the optimal cumulative regret in sample size T, O(log T), and further attains a tighter bound in both covariates dimension d and the number of significant covariates s, O(s^2 (s + log d). In addition, we develop a linear approximation method, the 2-step Weighted Lasso procedure, to identify the MCP estimator for the MCP-Bandit algorithm under non-i.i.d. samples. Using this procedure, the MCP estimator matches the oracle estimator with high probability. Finally, we present two experiments to benchmark our proposed the MCP-Bandit algorithm to other bandit algorithms. Both experiments demonstrate that the MCP-Bandit algorithm performs favorably over other benchmark algorithms, especially when there is a high level of data sparsity or when the sample size is not too small.}
}

Endnote

%0 Conference Paper
%T Minimax Concave Penalized Multi-Armed Bandit Model with High-Dimensional Covariates
%A Xue Wang
%A Mingcheng Wei
%A Tao Yao
%B Proceedings of the 35th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2018
%E Jennifer Dy
%E Andreas Krause	
%F pmlr-v80-wang18j
%I PMLR
%P 5200--5208
%U https://proceedings.mlr.press/v80/wang18j.html
%V 80
%X In this paper, we propose a Minimax Concave Penalized Multi-Armed Bandit (MCP-Bandit) algorithm for a decision-maker facing high-dimensional data with latent sparse structure in an online learning and decision-making process. We demonstrate that the MCP-Bandit algorithm asymptotically achieves the optimal cumulative regret in sample size T, O(log T), and further attains a tighter bound in both covariates dimension d and the number of significant covariates s, O(s^2 (s + log d). In addition, we develop a linear approximation method, the 2-step Weighted Lasso procedure, to identify the MCP estimator for the MCP-Bandit algorithm under non-i.i.d. samples. Using this procedure, the MCP estimator matches the oracle estimator with high probability. Finally, we present two experiments to benchmark our proposed the MCP-Bandit algorithm to other bandit algorithms. Both experiments demonstrate that the MCP-Bandit algorithm performs favorably over other benchmark algorithms, especially when there is a high level of data sparsity or when the sample size is not too small.

APA

Wang, X., Wei, M. & Yao, T.. (2018). Minimax Concave Penalized Multi-Armed Bandit Model with High-Dimensional Covariates. Proceedings of the 35th International Conference on Machine Learning, in Proceedings of Machine Learning Research 80:5200-5208 Available from https://proceedings.mlr.press/v80/wang18j.html.

Minimax Concave Penalized Multi-Armed Bandit Model with High-Dimensional Covariates

Abstract

Cite this Paper

Related Material