Modeling Language Tokens as Functionals of Semantic Fields

Zhengqi Pei, Anran Zhang, Shuhui Wang, Qingming Huang
Proceedings of the 41st International Conference on Machine Learning, PMLR 235:40114-40128, 2024.

Abstract

Recent advances in natural language processing have relied heavily on using Transformer-based language models. However, Transformers often require large parameter sizes and model depth. Existing Transformer-free approaches using state-space models demonstrate superiority over Transformers, yet they still lack a neuro-biologically connection to the human brain. This paper proposes LasF, representing Language tokens as Functionals of semantic fields, to simulate the neuronal behaviors for better language modeling. The LasF module is equivalent to a nonlinear approximator tailored for sequential data. By replacing the final layers of pre-trained language models with the LasF module, we obtain LasF-based models. Experiments conducted for standard reading comprehension and question-answering tasks demonstrate that the LasF-based models consistently improve accuracy with fewer parameters. Besides, we use CommonsenseQA’s blind test set to evaluate a full-parameter tuned LasF-based model, which outperforms the prior best ensemble and single models by 0.4 and 3.1, respectively. Furthermore, our LasF-only language model trained from scratch outperforms existing parameter-efficient language models on standard datasets such as WikiText103 and PennTreebank.

Cite this Paper


BibTeX
@InProceedings{pmlr-v235-pei24c, title = {Modeling Language Tokens as Functionals of Semantic Fields}, author = {Pei, Zhengqi and Zhang, Anran and Wang, Shuhui and Huang, Qingming}, booktitle = {Proceedings of the 41st International Conference on Machine Learning}, pages = {40114--40128}, year = {2024}, editor = {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix}, volume = {235}, series = {Proceedings of Machine Learning Research}, month = {21--27 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v235/main/assets/pei24c/pei24c.pdf}, url = {https://proceedings.mlr.press/v235/pei24c.html}, abstract = {Recent advances in natural language processing have relied heavily on using Transformer-based language models. However, Transformers often require large parameter sizes and model depth. Existing Transformer-free approaches using state-space models demonstrate superiority over Transformers, yet they still lack a neuro-biologically connection to the human brain. This paper proposes ${\it LasF}$, representing ${\bf L}$anguage tokens ${\bf as}$ ${\bf F}$unctionals of semantic fields, to simulate the neuronal behaviors for better language modeling. The ${\it LasF}$ module is equivalent to a nonlinear approximator tailored for sequential data. By replacing the final layers of pre-trained language models with the ${\it LasF}$ module, we obtain ${\it LasF}$-based models. Experiments conducted for standard reading comprehension and question-answering tasks demonstrate that the ${\it LasF}$-based models consistently improve accuracy with fewer parameters. Besides, we use CommonsenseQA’s blind test set to evaluate a full-parameter tuned ${\it LasF}$-based model, which outperforms the prior best ensemble and single models by $0.4%$ and $3.1%$, respectively. Furthermore, our ${\it LasF}$-only language model trained from scratch outperforms existing parameter-efficient language models on standard datasets such as WikiText103 and PennTreebank.} }
Endnote
%0 Conference Paper %T Modeling Language Tokens as Functionals of Semantic Fields %A Zhengqi Pei %A Anran Zhang %A Shuhui Wang %A Qingming Huang %B Proceedings of the 41st International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2024 %E Ruslan Salakhutdinov %E Zico Kolter %E Katherine Heller %E Adrian Weller %E Nuria Oliver %E Jonathan Scarlett %E Felix Berkenkamp %F pmlr-v235-pei24c %I PMLR %P 40114--40128 %U https://proceedings.mlr.press/v235/pei24c.html %V 235 %X Recent advances in natural language processing have relied heavily on using Transformer-based language models. However, Transformers often require large parameter sizes and model depth. Existing Transformer-free approaches using state-space models demonstrate superiority over Transformers, yet they still lack a neuro-biologically connection to the human brain. This paper proposes ${\it LasF}$, representing ${\bf L}$anguage tokens ${\bf as}$ ${\bf F}$unctionals of semantic fields, to simulate the neuronal behaviors for better language modeling. The ${\it LasF}$ module is equivalent to a nonlinear approximator tailored for sequential data. By replacing the final layers of pre-trained language models with the ${\it LasF}$ module, we obtain ${\it LasF}$-based models. Experiments conducted for standard reading comprehension and question-answering tasks demonstrate that the ${\it LasF}$-based models consistently improve accuracy with fewer parameters. Besides, we use CommonsenseQA’s blind test set to evaluate a full-parameter tuned ${\it LasF}$-based model, which outperforms the prior best ensemble and single models by $0.4%$ and $3.1%$, respectively. Furthermore, our ${\it LasF}$-only language model trained from scratch outperforms existing parameter-efficient language models on standard datasets such as WikiText103 and PennTreebank.
APA
Pei, Z., Zhang, A., Wang, S. & Huang, Q.. (2024). Modeling Language Tokens as Functionals of Semantic Fields. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:40114-40128 Available from https://proceedings.mlr.press/v235/pei24c.html.

Related Material