[edit]
Modeling Language Tokens as Functionals of Semantic Fields
Proceedings of the 41st International Conference on Machine Learning, PMLR 235:40114-40128, 2024.
Abstract
Recent advances in natural language processing have relied heavily on using Transformer-based language models. However, Transformers often require large parameter sizes and model depth. Existing Transformer-free approaches using state-space models demonstrate superiority over Transformers, yet they still lack a neuro-biologically connection to the human brain. This paper proposes LasF, representing Language tokens as Functionals of semantic fields, to simulate the neuronal behaviors for better language modeling. The LasF module is equivalent to a nonlinear approximator tailored for sequential data. By replacing the final layers of pre-trained language models with the LasF module, we obtain LasF-based models. Experiments conducted for standard reading comprehension and question-answering tasks demonstrate that the LasF-based models consistently improve accuracy with fewer parameters. Besides, we use CommonsenseQA’s blind test set to evaluate a full-parameter tuned LasF-based model, which outperforms the prior best ensemble and single models by 0.4 and 3.1, respectively. Furthermore, our LasF-only language model trained from scratch outperforms existing parameter-efficient language models on standard datasets such as WikiText103 and PennTreebank.