Coactive Learning for Large Language Models using Implicit User Feedback

Aaron David Tucker, Kianté Brantley, Adam Cahall, Thorsten Joachims
Proceedings of the 41st International Conference on Machine Learning, PMLR 235:48809-48822, 2024.

Abstract

We propose coactive learning as a model and feedback mechanism for training large language models (LLMs). The key insight is that users provide implicit feedback whenever they edit the text $y$ proposed by an LLM. While the edited text $\bar y$ is typically not a gold-standard example for supervised training, coactive learning merely requires that the edited text $\bar y$ is an improvement over the proposed text $y$. Note that such weak implicit preference feedback $\bar y \succ y$ is available in many application settings on a per-user basis, thus enabling the personalization of LLMs. In this paper, we develop the theoretical basis for coactive training of non-linear models, and we derive CoRLL as the first coactive learning algorithm for LLMs. Empirical results indicate that CoRLL is effective even for weak and noisy coactive preference feedback, making it a promising algorithm for training and personalization of LLMs from feedback that is naturally collected in many use cases.

Cite this Paper


BibTeX
@InProceedings{pmlr-v235-tucker24a, title = {Coactive Learning for Large Language Models using Implicit User Feedback}, author = {Tucker, Aaron David and Brantley, Kiant\'{e} and Cahall, Adam and Joachims, Thorsten}, booktitle = {Proceedings of the 41st International Conference on Machine Learning}, pages = {48809--48822}, year = {2024}, editor = {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix}, volume = {235}, series = {Proceedings of Machine Learning Research}, month = {21--27 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v235/main/assets/tucker24a/tucker24a.pdf}, url = {https://proceedings.mlr.press/v235/tucker24a.html}, abstract = {We propose coactive learning as a model and feedback mechanism for training large language models (LLMs). The key insight is that users provide implicit feedback whenever they edit the text $y$ proposed by an LLM. While the edited text $\bar y$ is typically not a gold-standard example for supervised training, coactive learning merely requires that the edited text $\bar y$ is an improvement over the proposed text $y$. Note that such weak implicit preference feedback $\bar y \succ y$ is available in many application settings on a per-user basis, thus enabling the personalization of LLMs. In this paper, we develop the theoretical basis for coactive training of non-linear models, and we derive CoRLL as the first coactive learning algorithm for LLMs. Empirical results indicate that CoRLL is effective even for weak and noisy coactive preference feedback, making it a promising algorithm for training and personalization of LLMs from feedback that is naturally collected in many use cases.} }
Endnote
%0 Conference Paper %T Coactive Learning for Large Language Models using Implicit User Feedback %A Aaron David Tucker %A Kianté Brantley %A Adam Cahall %A Thorsten Joachims %B Proceedings of the 41st International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2024 %E Ruslan Salakhutdinov %E Zico Kolter %E Katherine Heller %E Adrian Weller %E Nuria Oliver %E Jonathan Scarlett %E Felix Berkenkamp %F pmlr-v235-tucker24a %I PMLR %P 48809--48822 %U https://proceedings.mlr.press/v235/tucker24a.html %V 235 %X We propose coactive learning as a model and feedback mechanism for training large language models (LLMs). The key insight is that users provide implicit feedback whenever they edit the text $y$ proposed by an LLM. While the edited text $\bar y$ is typically not a gold-standard example for supervised training, coactive learning merely requires that the edited text $\bar y$ is an improvement over the proposed text $y$. Note that such weak implicit preference feedback $\bar y \succ y$ is available in many application settings on a per-user basis, thus enabling the personalization of LLMs. In this paper, we develop the theoretical basis for coactive training of non-linear models, and we derive CoRLL as the first coactive learning algorithm for LLMs. Empirical results indicate that CoRLL is effective even for weak and noisy coactive preference feedback, making it a promising algorithm for training and personalization of LLMs from feedback that is naturally collected in many use cases.
APA
Tucker, A.D., Brantley, K., Cahall, A. & Joachims, T.. (2024). Coactive Learning for Large Language Models using Implicit User Feedback. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:48809-48822 Available from https://proceedings.mlr.press/v235/tucker24a.html.

Related Material