Dissimilarity Bandits

Paolo Battellani; Alberto Maria Metelli; Francesco Trovò

Dissimilarity Bandits

Paolo Battellani, Alberto Maria Metelli, Francesco Trovò

Proceedings of The 27th International Conference on Artificial Intelligence and Statistics, PMLR 238:3637-3645, 2024.

Abstract

We study a novel sequential decision-making setting, namely the dissimilarity bandits. At each round, the learner pulls an arm that provides a stochastic d-dimensional observation vector. The learner aims to identify the pair of arms with the maximum dissimilarity, where such an index is computed over pairs of expected observation vectors. We propose Successive Elimination for Dissimilarity (SED), a fixed-confidence best-pair identification algorithm based on sequential elimination. SED discards individual arms when there is statistical evidence that they cannot belong to a pair of most dissimilar arms and, thus, effectively exploits the structure of the setting by reusing the estimates of the expected observation vectors. We provide results on the sample complexity of SED, depending on {HP}, a novel index characterizing the complexity of identifying the pair of the most dissimilar arms. Then, we provide a sample complexity lower bound, highlighting the challenges of the identification problem for dissimilarity bandits, which is almost matched by our SED. Finally, we compare our approach over synthetically generated data and a realistic environmental monitoring domain against classical and combinatorial best-arm identification algorithms for the cases $d=1$ and $d>1$.

Cite this Paper

BibTeX

@InProceedings{pmlr-v238-battellani24a,
  title = 	 {Dissimilarity Bandits},
  author =       {Battellani, Paolo and Maria Metelli, Alberto and Trov\`{o}, Francesco},
  booktitle = 	 {Proceedings of The 27th International Conference on Artificial Intelligence and Statistics},
  pages = 	 {3637--3645},
  year = 	 {2024},
  editor = 	 {Dasgupta, Sanjoy and Mandt, Stephan and Li, Yingzhen},
  volume = 	 {238},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {02--04 May},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v238/battellani24a/battellani24a.pdf},
  url = 	 {https://proceedings.mlr.press/v238/battellani24a.html},
  abstract = 	 {We study a novel sequential decision-making setting, namely the dissimilarity bandits. At each round, the learner pulls an arm that provides a stochastic d-dimensional observation vector. The learner aims to identify the pair of arms with the maximum dissimilarity, where such an index is computed over pairs of expected observation vectors. We propose Successive Elimination for Dissimilarity (SED), a fixed-confidence best-pair identification algorithm based on sequential elimination. SED discards individual arms when there is statistical evidence that they cannot belong to a pair of most dissimilar arms and, thus, effectively exploits the structure of the setting by reusing the estimates of the expected observation vectors. We provide results on the sample complexity of SED, depending on {HP}, a novel index characterizing the complexity of identifying the pair of the most dissimilar arms. Then, we provide a sample complexity lower bound, highlighting the challenges of the identification problem for dissimilarity bandits, which is almost matched by our SED. Finally, we compare our approach over synthetically generated data and a realistic environmental monitoring domain against classical and combinatorial best-arm identification algorithms for the cases $d=1$ and $d>1$.}
}

Endnote

%0 Conference Paper
%T Dissimilarity Bandits
%A Paolo Battellani
%A Alberto Maria Metelli
%A Francesco Trovò
%B Proceedings of The 27th International Conference on Artificial Intelligence and Statistics
%C Proceedings of Machine Learning Research
%D 2024
%E Sanjoy Dasgupta
%E Stephan Mandt
%E Yingzhen Li	
%F pmlr-v238-battellani24a
%I PMLR
%P 3637--3645
%U https://proceedings.mlr.press/v238/battellani24a.html
%V 238
%X We study a novel sequential decision-making setting, namely the dissimilarity bandits. At each round, the learner pulls an arm that provides a stochastic d-dimensional observation vector. The learner aims to identify the pair of arms with the maximum dissimilarity, where such an index is computed over pairs of expected observation vectors. We propose Successive Elimination for Dissimilarity (SED), a fixed-confidence best-pair identification algorithm based on sequential elimination. SED discards individual arms when there is statistical evidence that they cannot belong to a pair of most dissimilar arms and, thus, effectively exploits the structure of the setting by reusing the estimates of the expected observation vectors. We provide results on the sample complexity of SED, depending on {HP}, a novel index characterizing the complexity of identifying the pair of the most dissimilar arms. Then, we provide a sample complexity lower bound, highlighting the challenges of the identification problem for dissimilarity bandits, which is almost matched by our SED. Finally, we compare our approach over synthetically generated data and a realistic environmental monitoring domain against classical and combinatorial best-arm identification algorithms for the cases $d=1$ and $d>1$.

APA

Battellani, P., Maria Metelli, A. & Trovò, F.. (2024). Dissimilarity Bandits. Proceedings of The 27th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 238:3637-3645 Available from https://proceedings.mlr.press/v238/battellani24a.html.

Dissimilarity Bandits

Abstract

Cite this Paper

Related Material