Minimax Concave Penalized Multi-Armed Bandit Model with High-Dimensional Covariates

Xue Wang, Mingcheng Wei, Tao Yao
Proceedings of the 35th International Conference on Machine Learning, PMLR 80:5200-5208, 2018.

Abstract

In this paper, we propose a Minimax Concave Penalized Multi-Armed Bandit (MCP-Bandit) algorithm for a decision-maker facing high-dimensional data with latent sparse structure in an online learning and decision-making process. We demonstrate that the MCP-Bandit algorithm asymptotically achieves the optimal cumulative regret in sample size T, O(log T), and further attains a tighter bound in both covariates dimension d and the number of significant covariates s, O(s^2 (s + log d). In addition, we develop a linear approximation method, the 2-step Weighted Lasso procedure, to identify the MCP estimator for the MCP-Bandit algorithm under non-i.i.d. samples. Using this procedure, the MCP estimator matches the oracle estimator with high probability. Finally, we present two experiments to benchmark our proposed the MCP-Bandit algorithm to other bandit algorithms. Both experiments demonstrate that the MCP-Bandit algorithm performs favorably over other benchmark algorithms, especially when there is a high level of data sparsity or when the sample size is not too small.

Cite this Paper


BibTeX
@InProceedings{pmlr-v80-wang18j, title = {Minimax Concave Penalized Multi-Armed Bandit Model with High-Dimensional Covariates}, author = {Wang, Xue and Wei, Mingcheng and Yao, Tao}, booktitle = {Proceedings of the 35th International Conference on Machine Learning}, pages = {5200--5208}, year = {2018}, editor = {Dy, Jennifer and Krause, Andreas}, volume = {80}, series = {Proceedings of Machine Learning Research}, month = {10--15 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v80/wang18j/wang18j.pdf}, url = {https://proceedings.mlr.press/v80/wang18j.html}, abstract = {In this paper, we propose a Minimax Concave Penalized Multi-Armed Bandit (MCP-Bandit) algorithm for a decision-maker facing high-dimensional data with latent sparse structure in an online learning and decision-making process. We demonstrate that the MCP-Bandit algorithm asymptotically achieves the optimal cumulative regret in sample size T, O(log T), and further attains a tighter bound in both covariates dimension d and the number of significant covariates s, O(s^2 (s + log d). In addition, we develop a linear approximation method, the 2-step Weighted Lasso procedure, to identify the MCP estimator for the MCP-Bandit algorithm under non-i.i.d. samples. Using this procedure, the MCP estimator matches the oracle estimator with high probability. Finally, we present two experiments to benchmark our proposed the MCP-Bandit algorithm to other bandit algorithms. Both experiments demonstrate that the MCP-Bandit algorithm performs favorably over other benchmark algorithms, especially when there is a high level of data sparsity or when the sample size is not too small.} }
Endnote
%0 Conference Paper %T Minimax Concave Penalized Multi-Armed Bandit Model with High-Dimensional Covariates %A Xue Wang %A Mingcheng Wei %A Tao Yao %B Proceedings of the 35th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2018 %E Jennifer Dy %E Andreas Krause %F pmlr-v80-wang18j %I PMLR %P 5200--5208 %U https://proceedings.mlr.press/v80/wang18j.html %V 80 %X In this paper, we propose a Minimax Concave Penalized Multi-Armed Bandit (MCP-Bandit) algorithm for a decision-maker facing high-dimensional data with latent sparse structure in an online learning and decision-making process. We demonstrate that the MCP-Bandit algorithm asymptotically achieves the optimal cumulative regret in sample size T, O(log T), and further attains a tighter bound in both covariates dimension d and the number of significant covariates s, O(s^2 (s + log d). In addition, we develop a linear approximation method, the 2-step Weighted Lasso procedure, to identify the MCP estimator for the MCP-Bandit algorithm under non-i.i.d. samples. Using this procedure, the MCP estimator matches the oracle estimator with high probability. Finally, we present two experiments to benchmark our proposed the MCP-Bandit algorithm to other bandit algorithms. Both experiments demonstrate that the MCP-Bandit algorithm performs favorably over other benchmark algorithms, especially when there is a high level of data sparsity or when the sample size is not too small.
APA
Wang, X., Wei, M. & Yao, T.. (2018). Minimax Concave Penalized Multi-Armed Bandit Model with High-Dimensional Covariates. Proceedings of the 35th International Conference on Machine Learning, in Proceedings of Machine Learning Research 80:5200-5208 Available from https://proceedings.mlr.press/v80/wang18j.html.

Related Material