Modeling Language Tokens as Functionals of Semantic Fields

Zhengqi Pei; Anran Zhang; Shuhui Wang; Qingming Huang

Modeling Language Tokens as Functionals of Semantic Fields

Zhengqi Pei, Anran Zhang, Shuhui Wang, Qingming Huang

Proceedings of the 41st International Conference on Machine Learning, PMLR 235:40114-40128, 2024.

Abstract

Recent advances in natural language processing have relied heavily on using Transformer-based language models. However, Transformers often require large parameter sizes and model depth. Existing Transformer-free approaches using state-space models demonstrate superiority over Transformers, yet they still lack a neuro-biologically connection to the human brain. This paper proposes

${\it LasF}$ , representing

${\bf L}$ anguage tokens

${\bf as}$

${\bf F}$ unctionals of semantic fields, to simulate the neuronal behaviors for better language modeling. The

${\it LasF}$ module is equivalent to a nonlinear approximator tailored for sequential data. By replacing the final layers of pre-trained language models with the

${\it LasF}$ module, we obtain

${\it LasF}$ -based models. Experiments conducted for standard reading comprehension and question-answering tasks demonstrate that the

${\it LasF}$ -based models consistently improve accuracy with fewer parameters. Besides, we use CommonsenseQA’s blind test set to evaluate a full-parameter tuned

${\it LasF}$ -based model, which outperforms the prior best ensemble and single models by

$0.4%$ and

$3.1%$ , respectively. Furthermore, our

${\it LasF}$ -only language model trained from scratch outperforms existing parameter-efficient language models on standard datasets such as WikiText103 and PennTreebank.

Cite this Paper

BibTeX


@InProceedings{pmlr-v235-pei24c,
  title = 	 {Modeling Language Tokens as Functionals of Semantic Fields},
  author =       {Pei, Zhengqi and Zhang, Anran and Wang, Shuhui and Huang, Qingming},
  booktitle = 	 {Proceedings of the 41st International Conference on Machine Learning},
  pages = 	 {40114--40128},
  year = 	 {2024},
  editor = 	 {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix},
  volume = 	 {235},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {21--27 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v235/main/assets/pei24c/pei24c.pdf},
  url = 	 {https://proceedings.mlr.press/v235/pei24c.html},
  abstract = 	 {Recent advances in natural language processing have relied heavily on using Transformer-based language models. However, Transformers often require large parameter sizes and model depth. Existing Transformer-free approaches using state-space models demonstrate superiority over Transformers, yet they still lack a neuro-biologically connection to the human brain. This paper proposes ${\it LasF}$, representing ${\bf L}$anguage tokens ${\bf as}$ ${\bf F}$unctionals of semantic fields, to simulate the neuronal behaviors for better language modeling. The ${\it LasF}$ module is equivalent to a nonlinear approximator tailored for sequential data. By replacing the final layers of pre-trained language models with the ${\it LasF}$ module, we obtain ${\it LasF}$-based models. Experiments conducted for standard reading comprehension and question-answering tasks demonstrate that the ${\it LasF}$-based models consistently improve accuracy with fewer parameters. Besides, we use CommonsenseQA’s blind test set to evaluate a full-parameter tuned ${\it LasF}$-based model, which outperforms the prior best ensemble and single models by $0.4%$ and $3.1%$, respectively. Furthermore, our ${\it LasF}$-only language model trained from scratch outperforms existing parameter-efficient language models on standard datasets such as WikiText103 and PennTreebank.}
}

Endnote

%0 Conference Paper
%T Modeling Language Tokens as Functionals of Semantic Fields
%A Zhengqi Pei
%A Anran Zhang
%A Shuhui Wang
%A Qingming Huang
%B Proceedings of the 41st International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2024
%E Ruslan Salakhutdinov
%E Zico Kolter
%E Katherine Heller
%E Adrian Weller
%E Nuria Oliver
%E Jonathan Scarlett
%E Felix Berkenkamp	
%F pmlr-v235-pei24c
%I PMLR
%P 40114--40128
%U https://proceedings.mlr.press/v235/pei24c.html
%V 235
%X Recent advances in natural language processing have relied heavily on using Transformer-based language models. However, Transformers often require large parameter sizes and model depth. Existing Transformer-free approaches using state-space models demonstrate superiority over Transformers, yet they still lack a neuro-biologically connection to the human brain. This paper proposes ${\it LasF}$, representing ${\bf L}$anguage tokens ${\bf as}$ ${\bf F}$unctionals of semantic fields, to simulate the neuronal behaviors for better language modeling. The ${\it LasF}$ module is equivalent to a nonlinear approximator tailored for sequential data. By replacing the final layers of pre-trained language models with the ${\it LasF}$ module, we obtain ${\it LasF}$-based models. Experiments conducted for standard reading comprehension and question-answering tasks demonstrate that the ${\it LasF}$-based models consistently improve accuracy with fewer parameters. Besides, we use CommonsenseQA’s blind test set to evaluate a full-parameter tuned ${\it LasF}$-based model, which outperforms the prior best ensemble and single models by $0.4%$ and $3.1%$, respectively. Furthermore, our ${\it LasF}$-only language model trained from scratch outperforms existing parameter-efficient language models on standard datasets such as WikiText103 and PennTreebank.

APA


Pei, Z., Zhang, A., Wang, S. & Huang, Q.. (2024). Modeling Language Tokens as Functionals of Semantic Fields. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:40114-40128 Available from https://proceedings.mlr.press/v235/pei24c.html.

Modeling Language Tokens as Functionals of Semantic Fields

Abstract

Cite this Paper

Related Material