Covariance-adapting algorithm for semi-bandits with application to sparse outcomes

Pierre Perrault, Michal Valko, Vianney Perchet
; Proceedings of Thirty Third Conference on Learning Theory, PMLR 125:3152-3184, 2020.

Abstract

We investigate \emph{stochastic combinatorial semi-bandits}, where the entire joint distribution of outcomes impacts the complexity of the problem instance (unlike in the standard bandits). Typical distributions considered depend on specific parameter values, whose prior knowledge is required in theory but quite difficult to estimate in practice; an example is the commonly assumed \emph{sub-Gaussian} family. We alleviate this issue by instead considering a new general family of \emph{sub-exponential} distributions, which contains bounded and Gaussian ones. We prove a new lower bound on the regret on this family, that is parameterized by the \emph{unknown} covariance matrix, a tighter quantity than the sub-Gaussian matrix. We then construct an algorithm that uses covariance estimates, and provide a tight asymptotic analysis of the regret. Finally, we apply and extend our results to the family of sparse outcomes, which has applications in many recommender systems.

Cite this Paper


BibTeX
@InProceedings{pmlr-v125-perrault20a, title = {Covariance-adapting algorithm for semi-bandits with application to sparse outcomes}, author = {Perrault, Pierre and Valko, Michal and Perchet, Vianney}, pages = {3152--3184}, year = {2020}, editor = {Jacob Abernethy and Shivani Agarwal}, volume = {125}, series = {Proceedings of Machine Learning Research}, address = {}, month = {09--12 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v125/perrault20a/perrault20a.pdf}, url = {http://proceedings.mlr.press/v125/perrault20a.html}, abstract = { We investigate \emph{stochastic combinatorial semi-bandits}, where the entire joint distribution of outcomes impacts the complexity of the problem instance (unlike in the standard bandits). Typical distributions considered depend on specific parameter values, whose prior knowledge is required in theory but quite difficult to estimate in practice; an example is the commonly assumed \emph{sub-Gaussian} family. We alleviate this issue by instead considering a new general family of \emph{sub-exponential} distributions, which contains bounded and Gaussian ones. We prove a new lower bound on the regret on this family, that is parameterized by the \emph{unknown} covariance matrix, a tighter quantity than the sub-Gaussian matrix. We then construct an algorithm that uses covariance estimates, and provide a tight asymptotic analysis of the regret. Finally, we apply and extend our results to the family of sparse outcomes, which has applications in many recommender systems. } }
Endnote
%0 Conference Paper %T Covariance-adapting algorithm for semi-bandits with application to sparse outcomes %A Pierre Perrault %A Michal Valko %A Vianney Perchet %B Proceedings of Thirty Third Conference on Learning Theory %C Proceedings of Machine Learning Research %D 2020 %E Jacob Abernethy %E Shivani Agarwal %F pmlr-v125-perrault20a %I PMLR %J Proceedings of Machine Learning Research %P 3152--3184 %U http://proceedings.mlr.press %V 125 %W PMLR %X We investigate \emph{stochastic combinatorial semi-bandits}, where the entire joint distribution of outcomes impacts the complexity of the problem instance (unlike in the standard bandits). Typical distributions considered depend on specific parameter values, whose prior knowledge is required in theory but quite difficult to estimate in practice; an example is the commonly assumed \emph{sub-Gaussian} family. We alleviate this issue by instead considering a new general family of \emph{sub-exponential} distributions, which contains bounded and Gaussian ones. We prove a new lower bound on the regret on this family, that is parameterized by the \emph{unknown} covariance matrix, a tighter quantity than the sub-Gaussian matrix. We then construct an algorithm that uses covariance estimates, and provide a tight asymptotic analysis of the regret. Finally, we apply and extend our results to the family of sparse outcomes, which has applications in many recommender systems.
APA
Perrault, P., Valko, M. & Perchet, V.. (2020). Covariance-adapting algorithm for semi-bandits with application to sparse outcomes. Proceedings of Thirty Third Conference on Learning Theory, in PMLR 125:3152-3184

Related Material