Training Transformers Together

Alexander Borzunov, Max Ryabinin, Tim Dettmers, Quentin Lhoest, Lucile Saulnier, Michael Diskin, Yacine Jernite, Thomas Wolf
Proceedings of the NeurIPS 2021 Competitions and Demonstrations Track, PMLR 176:335-342, 2022.

Abstract

The infrastructure necessary for training state-of-the-art models is becoming overly expensive, which makes training such models affordable only to large corporations and institutions. Recent work proposes several methods for training such models collaboratively, i.e., by pooling together hardware from many independent parties and training a shared model over the Internet. In this demonstration, we collaboratively trained a text-to-image transformer similar to OpenAI DALL-E. We invited the viewers to join the ongoing training run, showing them instructions on how to contribute using the available hardware. We explained how to address the engineering challenges associated with such a training run (slow communication, limited memory, uneven performance between devices, and security concerns) and discussed how the viewers can set up collaborative training runs themselves. Finally, we show that the resulting model generates images of reasonable quality on a number of prompts.

Cite this Paper


BibTeX
@InProceedings{pmlr-v176-borzunov22a, title = {Training Transformers Together}, author = {Borzunov, Alexander and Ryabinin, Max and Dettmers, Tim and Lhoest, Quentin and Saulnier, Lucile and Diskin, Michael and Jernite, Yacine and Wolf, Thomas}, booktitle = {Proceedings of the NeurIPS 2021 Competitions and Demonstrations Track}, pages = {335--342}, year = {2022}, editor = {Kiela, Douwe and Ciccone, Marco and Caputo, Barbara}, volume = {176}, series = {Proceedings of Machine Learning Research}, month = {06--14 Dec}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v176/borzunov22a/borzunov22a.pdf}, url = {https://proceedings.mlr.press/v176/borzunov22a.html}, abstract = {The infrastructure necessary for training state-of-the-art models is becoming overly expensive, which makes training such models affordable only to large corporations and institutions. Recent work proposes several methods for training such models collaboratively, i.e., by pooling together hardware from many independent parties and training a shared model over the Internet. In this demonstration, we collaboratively trained a text-to-image transformer similar to OpenAI DALL-E. We invited the viewers to join the ongoing training run, showing them instructions on how to contribute using the available hardware. We explained how to address the engineering challenges associated with such a training run (slow communication, limited memory, uneven performance between devices, and security concerns) and discussed how the viewers can set up collaborative training runs themselves. Finally, we show that the resulting model generates images of reasonable quality on a number of prompts.} }
Endnote
%0 Conference Paper %T Training Transformers Together %A Alexander Borzunov %A Max Ryabinin %A Tim Dettmers %A Quentin Lhoest %A Lucile Saulnier %A Michael Diskin %A Yacine Jernite %A Thomas Wolf %B Proceedings of the NeurIPS 2021 Competitions and Demonstrations Track %C Proceedings of Machine Learning Research %D 2022 %E Douwe Kiela %E Marco Ciccone %E Barbara Caputo %F pmlr-v176-borzunov22a %I PMLR %P 335--342 %U https://proceedings.mlr.press/v176/borzunov22a.html %V 176 %X The infrastructure necessary for training state-of-the-art models is becoming overly expensive, which makes training such models affordable only to large corporations and institutions. Recent work proposes several methods for training such models collaboratively, i.e., by pooling together hardware from many independent parties and training a shared model over the Internet. In this demonstration, we collaboratively trained a text-to-image transformer similar to OpenAI DALL-E. We invited the viewers to join the ongoing training run, showing them instructions on how to contribute using the available hardware. We explained how to address the engineering challenges associated with such a training run (slow communication, limited memory, uneven performance between devices, and security concerns) and discussed how the viewers can set up collaborative training runs themselves. Finally, we show that the resulting model generates images of reasonable quality on a number of prompts.
APA
Borzunov, A., Ryabinin, M., Dettmers, T., Lhoest, Q., Saulnier, L., Diskin, M., Jernite, Y. & Wolf, T.. (2022). Training Transformers Together. Proceedings of the NeurIPS 2021 Competitions and Demonstrations Track, in Proceedings of Machine Learning Research 176:335-342 Available from https://proceedings.mlr.press/v176/borzunov22a.html.

Related Material