A Closer Look at the Limitations of Instruction Tuning

Sreyan Ghosh; Chandra Kiran Reddy Evuru; Sonal Kumar; Ramaneswaran S; Deepali Aneja; Zeyu Jin; Ramani Duraiswami; Dinesh Manocha

A Closer Look at the Limitations of Instruction Tuning

Sreyan Ghosh, Chandra Kiran Reddy Evuru, Sonal Kumar, Ramaneswaran S, Deepali Aneja, Zeyu Jin, Ramani Duraiswami, Dinesh Manocha

Proceedings of the 41st International Conference on Machine Learning, PMLR 235:15559-15589, 2024.

Abstract

Instruction Tuning (IT), the process of training large language models (LLMs) using instruction-response pairs, has emerged as the predominant method for transforming base pre-trained LLMs into open-domain conversational agents. While IT has achieved notable success and widespread adoption, its limitations and shortcomings remain underexplored. In this paper, through rigorous experiments and an in-depth analysis of the changes LLMs undergo through IT, we reveal various limitations of IT. In particular, we show that (1) IT fails to enhance knowledge or skills in LLMs. LoRA fine-tuning is limited to learning response initiation and style tokens, and full-parameter fine-tuning leads to knowledge degradation. (2) Copying response patterns from IT datasets derived from knowledgeable sources leads to a decline in response quality. (3) Full-parameter fine-tuning increases hallucination by inaccurately borrowing tokens from conceptually similar instances in the IT dataset for generating responses. (4) Popular methods to improve IT do not lead to performance improvements over a simple LoRA fine-tuned model. Our findings reveal that responses generated solely from pre-trained knowledge consistently outperform responses by models that learn any form of new knowledge from IT on open-source datasets. We hope the insights and challenges revealed in this paper inspire future work in related directions.

Cite this Paper

BibTeX


@InProceedings{pmlr-v235-ghosh24a,
  title = 	 {A Closer Look at the Limitations of Instruction Tuning},
  author =       {Ghosh, Sreyan and Evuru, Chandra Kiran Reddy and Kumar, Sonal and S, Ramaneswaran and Aneja, Deepali and Jin, Zeyu and Duraiswami, Ramani and Manocha, Dinesh},
  booktitle = 	 {Proceedings of the 41st International Conference on Machine Learning},
  pages = 	 {15559--15589},
  year = 	 {2024},
  editor = 	 {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix},
  volume = 	 {235},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {21--27 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v235/main/assets/ghosh24a/ghosh24a.pdf},
  url = 	 {https://proceedings.mlr.press/v235/ghosh24a.html},
  abstract = 	 {Instruction Tuning (IT), the process of training large language models (LLMs) using instruction-response pairs, has emerged as the predominant method for transforming base pre-trained LLMs into open-domain conversational agents. While IT has achieved notable success and widespread adoption, its limitations and shortcomings remain underexplored. In this paper, through rigorous experiments and an in-depth analysis of the changes LLMs undergo through IT, we reveal various limitations of IT. In particular, we show that (1) IT fails to enhance knowledge or skills in LLMs. LoRA fine-tuning is limited to learning response initiation and style tokens, and full-parameter fine-tuning leads to knowledge degradation. (2) Copying response patterns from IT datasets derived from knowledgeable sources leads to a decline in response quality. (3) Full-parameter fine-tuning increases hallucination by inaccurately borrowing tokens from conceptually similar instances in the IT dataset for generating responses. (4) Popular methods to improve IT do not lead to performance improvements over a simple LoRA fine-tuned model. Our findings reveal that responses generated solely from pre-trained knowledge consistently outperform responses by models that learn any form of new knowledge from IT on open-source datasets. We hope the insights and challenges revealed in this paper inspire future work in related directions.}
}

Endnote

%0 Conference Paper
%T A Closer Look at the Limitations of Instruction Tuning
%A Sreyan Ghosh
%A Chandra Kiran Reddy Evuru
%A Sonal Kumar
%A Ramaneswaran S
%A Deepali Aneja
%A Zeyu Jin
%A Ramani Duraiswami
%A Dinesh Manocha
%B Proceedings of the 41st International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2024
%E Ruslan Salakhutdinov
%E Zico Kolter
%E Katherine Heller
%E Adrian Weller
%E Nuria Oliver
%E Jonathan Scarlett
%E Felix Berkenkamp	
%F pmlr-v235-ghosh24a
%I PMLR
%P 15559--15589
%U https://proceedings.mlr.press/v235/ghosh24a.html
%V 235
%X Instruction Tuning (IT), the process of training large language models (LLMs) using instruction-response pairs, has emerged as the predominant method for transforming base pre-trained LLMs into open-domain conversational agents. While IT has achieved notable success and widespread adoption, its limitations and shortcomings remain underexplored. In this paper, through rigorous experiments and an in-depth analysis of the changes LLMs undergo through IT, we reveal various limitations of IT. In particular, we show that (1) IT fails to enhance knowledge or skills in LLMs. LoRA fine-tuning is limited to learning response initiation and style tokens, and full-parameter fine-tuning leads to knowledge degradation. (2) Copying response patterns from IT datasets derived from knowledgeable sources leads to a decline in response quality. (3) Full-parameter fine-tuning increases hallucination by inaccurately borrowing tokens from conceptually similar instances in the IT dataset for generating responses. (4) Popular methods to improve IT do not lead to performance improvements over a simple LoRA fine-tuned model. Our findings reveal that responses generated solely from pre-trained knowledge consistently outperform responses by models that learn any form of new knowledge from IT on open-source datasets. We hope the insights and challenges revealed in this paper inspire future work in related directions.

APA


Ghosh, S., Evuru, C.K.R., Kumar, S., S, R., Aneja, D., Jin, Z., Duraiswami, R. & Manocha, D.. (2024). A Closer Look at the Limitations of Instruction Tuning. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:15559-15589 Available from https://proceedings.mlr.press/v235/ghosh24a.html.

A Closer Look at the Limitations of Instruction Tuning

Abstract

Cite this Paper

Related Material