Computational Bottlenecks of Training Small-scale Large Language Models

Saleh Ashkboos; Seyed Iman Mirzadeh; Keivan Alizadeh-Vahid; Mohammad Hossein Sekhavat; Moin Nabi; Mehrdad Farajtabar; Fartash Faghri

Computational Bottlenecks of Training Small-scale Large Language Models

Saleh Ashkboos, Seyed Iman Mirzadeh, Keivan Alizadeh-Vahid, Mohammad Hossein Sekhavat, Moin Nabi, Mehrdad Farajtabar, Fartash Faghri

Proceedings of The 4th NeurIPS Efficient Natural Language and Speech Processing Workshop, PMLR 262:14-21, 2024.

Abstract

While large language models (LLMs) dominate the AI landscape, Small-scale large Language Models (SLMs) are gaining attention due to cost and efficiency demands from consumers. However, there is limited research on the training behavior and computational requirements of SLMs. In this study, we explore the computational bottlenecks of training SLMs (up to 2B parameters) by examining the effects of various hyperparameters and configurations, including GPU type, batch size, model size, communication protocol, attention type, and the number of GPUs. We assess these factors on popular cloud services using metrics such as loss per dollar and tokens per second. Our findings aim to support the broader adoption and optimization of language model training for low-resource AI research institutes.

Cite this Paper

BibTeX

@InProceedings{pmlr-v262-ashkboos24a,
  title = 	 {Computational Bottlenecks of Training Small-scale Large Language Models},
  author =       {Ashkboos, Saleh and Iman Mirzadeh, Seyed and Alizadeh-Vahid, Keivan and Hossein Sekhavat, Mohammad and Nabi, Moin and Farajtabar, Mehrdad and Faghri, Fartash},
  booktitle = 	 {Proceedings of The 4th NeurIPS Efficient Natural Language and Speech Processing Workshop},
  pages = 	 {14--21},
  year = 	 {2024},
  editor = 	 {Rezagholizadeh, Mehdi and Passban, Peyman and Samiee, Soheila and Partovi Nia, Vahid and Cheng, Yu and Deng, Yue and Liu, Qun and Chen, Boxing},
  volume = 	 {262},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {14 Dec},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v262/main/assets/ashkboos24a/ashkboos24a.pdf},
  url = 	 {https://proceedings.mlr.press/v262/ashkboos24a.html},
  abstract = 	 {While large language models (LLMs) dominate the AI landscape, Small-scale large Language Models (SLMs) are gaining attention due to cost and efficiency demands from consumers. However, there is limited research on the training behavior and computational requirements of SLMs. In this study, we explore the computational bottlenecks of training SLMs (up to 2B parameters) by examining the effects of various hyperparameters and configurations, including GPU type, batch size, model size, communication protocol, attention type, and the number of GPUs. We assess these factors on popular cloud services using metrics such as loss per dollar and tokens per second. Our findings aim to support the broader adoption and optimization of language model training for low-resource AI research institutes.}
}

Endnote

%0 Conference Paper
%T Computational Bottlenecks of Training Small-scale Large Language Models
%A Saleh Ashkboos
%A Seyed Iman Mirzadeh
%A Keivan Alizadeh-Vahid
%A Mohammad Hossein Sekhavat
%A Moin Nabi
%A Mehrdad Farajtabar
%A Fartash Faghri
%B Proceedings of The 4th NeurIPS Efficient Natural Language and Speech Processing Workshop
%C Proceedings of Machine Learning Research
%D 2024
%E Mehdi Rezagholizadeh
%E Peyman Passban
%E Soheila Samiee
%E Vahid Partovi Nia
%E Yu Cheng
%E Yue Deng
%E Qun Liu
%E Boxing Chen	
%F pmlr-v262-ashkboos24a
%I PMLR
%P 14--21
%U https://proceedings.mlr.press/v262/ashkboos24a.html
%V 262
%X While large language models (LLMs) dominate the AI landscape, Small-scale large Language Models (SLMs) are gaining attention due to cost and efficiency demands from consumers. However, there is limited research on the training behavior and computational requirements of SLMs. In this study, we explore the computational bottlenecks of training SLMs (up to 2B parameters) by examining the effects of various hyperparameters and configurations, including GPU type, batch size, model size, communication protocol, attention type, and the number of GPUs. We assess these factors on popular cloud services using metrics such as loss per dollar and tokens per second. Our findings aim to support the broader adoption and optimization of language model training for low-resource AI research institutes.

APA

Ashkboos, S., Iman Mirzadeh, S., Alizadeh-Vahid, K., Hossein Sekhavat, M., Nabi, M., Farajtabar, M. & Faghri, F.. (2024). Computational Bottlenecks of Training Small-scale Large Language Models. Proceedings of The 4th NeurIPS Efficient Natural Language and Speech Processing Workshop, in Proceedings of Machine Learning Research 262:14-21 Available from https://proceedings.mlr.press/v262/ashkboos24a.html.

Related Material

Download PDF