MicroNet for Efficient Language Modeling

Zhongxia Yan, Hanrui Wang, Demi Guo, Song Han
Proceedings of the NeurIPS 2019 Competition and Demonstration Track, PMLR 123:215-231, 2020.

Abstract

It is important to design compact language models for efficient deployment. We improve upon recent advances in both the language modeling domain and the model-compression domain to construct parameter and computation efficient language models. We use an efficient transformer-based architecture with adaptive embedding and softmax, differentiable non-parametric cache, Hebbian softmax, knowledge distillation, network pruning, and low-bit quantization. In this paper, we provide the winning solution to the NeurIPS 2019 MicroNet Challenge in the language modeling track. Compared to the baseline language model provided by the MicroNet Challenge, our model is 90 times more parameter-efficient and 36 times more computation-efficient while achieving the required test perplexity of 35 on the Wikitext-103 dataset. We hope that this work will aid future research into efficient language models, and we have released our full source code at {https://github.com/mit-han-lab/neurips-micronet}.

Cite this Paper


BibTeX
@InProceedings{pmlr-v123-yan20a, title = {MicroNet for Efficient Language Modeling}, author = {Yan, Zhongxia and Wang, Hanrui and Guo, Demi and Han, Song}, booktitle = {Proceedings of the NeurIPS 2019 Competition and Demonstration Track}, pages = {215--231}, year = {2020}, editor = {Escalante, Hugo Jair and Hadsell, Raia}, volume = {123}, series = {Proceedings of Machine Learning Research}, month = {08--14 Dec}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v123/yan20a/yan20a.pdf}, url = {https://proceedings.mlr.press/v123/yan20a.html}, abstract = {It is important to design compact language models for efficient deployment. We improve upon recent advances in both the language modeling domain and the model-compression domain to construct parameter and computation efficient language models. We use an efficient transformer-based architecture with adaptive embedding and softmax, differentiable non-parametric cache, Hebbian softmax, knowledge distillation, network pruning, and low-bit quantization. In this paper, we provide the winning solution to the NeurIPS 2019 MicroNet Challenge in the language modeling track. Compared to the baseline language model provided by the MicroNet Challenge, our model is 90 times more parameter-efficient and 36 times more computation-efficient while achieving the required test perplexity of 35 on the Wikitext-103 dataset. We hope that this work will aid future research into efficient language models, and we have released our full source code at {https://github.com/mit-han-lab/neurips-micronet}.} }
Endnote
%0 Conference Paper %T MicroNet for Efficient Language Modeling %A Zhongxia Yan %A Hanrui Wang %A Demi Guo %A Song Han %B Proceedings of the NeurIPS 2019 Competition and Demonstration Track %C Proceedings of Machine Learning Research %D 2020 %E Hugo Jair Escalante %E Raia Hadsell %F pmlr-v123-yan20a %I PMLR %P 215--231 %U https://proceedings.mlr.press/v123/yan20a.html %V 123 %X It is important to design compact language models for efficient deployment. We improve upon recent advances in both the language modeling domain and the model-compression domain to construct parameter and computation efficient language models. We use an efficient transformer-based architecture with adaptive embedding and softmax, differentiable non-parametric cache, Hebbian softmax, knowledge distillation, network pruning, and low-bit quantization. In this paper, we provide the winning solution to the NeurIPS 2019 MicroNet Challenge in the language modeling track. Compared to the baseline language model provided by the MicroNet Challenge, our model is 90 times more parameter-efficient and 36 times more computation-efficient while achieving the required test perplexity of 35 on the Wikitext-103 dataset. We hope that this work will aid future research into efficient language models, and we have released our full source code at {https://github.com/mit-han-lab/neurips-micronet}.
APA
Yan, Z., Wang, H., Guo, D. & Han, S.. (2020). MicroNet for Efficient Language Modeling. Proceedings of the NeurIPS 2019 Competition and Demonstration Track, in Proceedings of Machine Learning Research 123:215-231 Available from https://proceedings.mlr.press/v123/yan20a.html.

Related Material