Improved Confidence Bounds for the Linear Logistic Model and Applications to Bandits

Kwang-Sung Jun; Lalit Jain; Blake Mason; Houssam Nassif

Improved Confidence Bounds for the Linear Logistic Model and Applications to Bandits

Kwang-Sung Jun, Lalit Jain, Blake Mason, Houssam Nassif

Proceedings of the 38th International Conference on Machine Learning, PMLR 139:5148-5157, 2021.

Abstract

We propose improved fixed-design confidence bounds for the linear logistic model. Our bounds significantly improve upon the state-of-the-art bound by Li et al. (2017) via recent developments of the self-concordant analysis of the logistic loss (Faury et al., 2020). Specifically, our confidence bound avoids a direct dependence on $1/\kappa$, where $\kappa$ is the minimal variance over all arms’ reward distributions. In general, $1/\kappa$ scales exponentially with the norm of the unknown linear parameter $\theta^*$. Instead of relying on this worst case quantity, our confidence bound for the reward of any given arm depends directly on the variance of that arm’s reward distribution. We present two applications of our novel bounds to pure exploration and regret minimization logistic bandits improving upon state-of-the-art performance guarantees. For pure exploration we also provide a lower bound highlighting a dependence on $1/\kappa$ for a family of instances.

Cite this Paper

BibTeX

@InProceedings{pmlr-v139-jun21a,
  title = 	 {Improved Confidence Bounds for the Linear Logistic Model and Applications to Bandits},
  author =       {Jun, Kwang-Sung and Jain, Lalit and Mason, Blake and Nassif, Houssam},
  booktitle = 	 {Proceedings of the 38th International Conference on Machine Learning},
  pages = 	 {5148--5157},
  year = 	 {2021},
  editor = 	 {Meila, Marina and Zhang, Tong},
  volume = 	 {139},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {18--24 Jul},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v139/jun21a/jun21a.pdf},
  url = 	 {https://proceedings.mlr.press/v139/jun21a.html},
  abstract = 	 {We propose improved fixed-design confidence bounds for the linear logistic model. Our bounds significantly improve upon the state-of-the-art bound by Li et al. (2017) via recent developments of the self-concordant analysis of the logistic loss (Faury et al., 2020). Specifically, our confidence bound avoids a direct dependence on $1/\kappa$, where $\kappa$ is the minimal variance over all arms’ reward distributions. In general, $1/\kappa$ scales exponentially with the norm of the unknown linear parameter $\theta^*$. Instead of relying on this worst case quantity, our confidence bound for the reward of any given arm depends directly on the variance of that arm’s reward distribution. We present two applications of our novel bounds to pure exploration and regret minimization logistic bandits improving upon state-of-the-art performance guarantees. For pure exploration we also provide a lower bound highlighting a dependence on $1/\kappa$ for a family of instances.}
}

Endnote

%0 Conference Paper
%T Improved Confidence Bounds for the Linear Logistic Model and Applications to Bandits
%A Kwang-Sung Jun
%A Lalit Jain
%A Blake Mason
%A Houssam Nassif
%B Proceedings of the 38th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2021
%E Marina Meila
%E Tong Zhang	
%F pmlr-v139-jun21a
%I PMLR
%P 5148--5157
%U https://proceedings.mlr.press/v139/jun21a.html
%V 139
%X We propose improved fixed-design confidence bounds for the linear logistic model. Our bounds significantly improve upon the state-of-the-art bound by Li et al. (2017) via recent developments of the self-concordant analysis of the logistic loss (Faury et al., 2020). Specifically, our confidence bound avoids a direct dependence on $1/\kappa$, where $\kappa$ is the minimal variance over all arms’ reward distributions. In general, $1/\kappa$ scales exponentially with the norm of the unknown linear parameter $\theta^*$. Instead of relying on this worst case quantity, our confidence bound for the reward of any given arm depends directly on the variance of that arm’s reward distribution. We present two applications of our novel bounds to pure exploration and regret minimization logistic bandits improving upon state-of-the-art performance guarantees. For pure exploration we also provide a lower bound highlighting a dependence on $1/\kappa$ for a family of instances.

APA

Jun, K., Jain, L., Mason, B. & Nassif, H.. (2021). Improved Confidence Bounds for the Linear Logistic Model and Applications to Bandits. Proceedings of the 38th International Conference on Machine Learning, in Proceedings of Machine Learning Research 139:5148-5157 Available from https://proceedings.mlr.press/v139/jun21a.html.

Improved Confidence Bounds for the Linear Logistic Model and Applications to Bandits

Abstract

Cite this Paper

Related Material