Efficient Exploration for LLMs

Vikranth Dwaracherla; Seyed Mohammad Asghari; Botao Hao; Benjamin Van Roy

Efficient Exploration for LLMs

Vikranth Dwaracherla, Seyed Mohammad Asghari, Botao Hao, Benjamin Van Roy

Proceedings of the 41st International Conference on Machine Learning, PMLR 235:12215-12227, 2024.

Abstract

We present evidence of substantial benefit from efficient exploration in gathering human feedback to improve large language models. In our experiments, an agent sequentially generates queries while fitting a reward model to the feedback received. Our best-performing agent generates queries using double Thompson sampling, with uncertainty represented by an epistemic neural network. Our results demonstrate that efficient exploration enables high levels of performance with far fewer queries. Further, both uncertainty estimation and the choice of exploration scheme play critical roles.

Cite this Paper

BibTeX


@InProceedings{pmlr-v235-dwaracherla24a,
  title = 	 {Efficient Exploration for {LLM}s},
  author =       {Dwaracherla, Vikranth and Asghari, Seyed Mohammad and Hao, Botao and Van Roy, Benjamin},
  booktitle = 	 {Proceedings of the 41st International Conference on Machine Learning},
  pages = 	 {12215--12227},
  year = 	 {2024},
  editor = 	 {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix},
  volume = 	 {235},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {21--27 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v235/main/assets/dwaracherla24a/dwaracherla24a.pdf},
  url = 	 {https://proceedings.mlr.press/v235/dwaracherla24a.html},
  abstract = 	 {We present evidence of substantial benefit from efficient exploration in gathering human feedback to improve large language models. In our experiments, an agent sequentially generates queries while fitting a reward model to the feedback received. Our best-performing agent generates queries using double Thompson sampling, with uncertainty represented by an epistemic neural network. Our results demonstrate that efficient exploration enables high levels of performance with far fewer queries. Further, both uncertainty estimation and the choice of exploration scheme play critical roles.}
}

Endnote

%0 Conference Paper
%T Efficient Exploration for LLMs
%A Vikranth Dwaracherla
%A Seyed Mohammad Asghari
%A Botao Hao
%A Benjamin Van Roy
%B Proceedings of the 41st International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2024
%E Ruslan Salakhutdinov
%E Zico Kolter
%E Katherine Heller
%E Adrian Weller
%E Nuria Oliver
%E Jonathan Scarlett
%E Felix Berkenkamp	
%F pmlr-v235-dwaracherla24a
%I PMLR
%P 12215--12227
%U https://proceedings.mlr.press/v235/dwaracherla24a.html
%V 235
%X We present evidence of substantial benefit from efficient exploration in gathering human feedback to improve large language models. In our experiments, an agent sequentially generates queries while fitting a reward model to the feedback received. Our best-performing agent generates queries using double Thompson sampling, with uncertainty represented by an epistemic neural network. Our results demonstrate that efficient exploration enables high levels of performance with far fewer queries. Further, both uncertainty estimation and the choice of exploration scheme play critical roles.

APA


Dwaracherla, V., Asghari, S.M., Hao, B. & Van Roy, B.. (2024). Efficient Exploration for LLMs. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:12215-12227 Available from https://proceedings.mlr.press/v235/dwaracherla24a.html.

Efficient Exploration for LLMs

Abstract

Cite this Paper

Related Material