ExLM: Rethinking the Impact of $\texttt[MASK]$ Tokens in Masked Language Models

Kangjie Zheng; Junwei Yang; Siyue Liang; Bin Feng; Zequn Liu; Wei Ju; Zhiping Xiao; Ming Zhang

ExLM: Rethinking the Impact of $\texttt[MASK]$ Tokens in Masked Language Models

Kangjie Zheng, Junwei Yang, Siyue Liang, Bin Feng, Zequn Liu, Wei Ju, Zhiping Xiao, Ming Zhang

Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:78405-78433, 2025.

Abstract

Masked Language Models (MLMs) have achieved remarkable success in many self-supervised representation learning tasks. MLMs are trained by randomly masking portions of the input sequences with $\texttt{[MASK]}$ tokens and learning to reconstruct the original content based on the remaining context. This paper explores the impact of $\texttt{[MASK]}$ tokens on MLMs. Analytical studies show that masking tokens can introduce the corrupted semantics problem, wherein the corrupted context may convey multiple, ambiguous meanings. This problem is also a key factor affecting the performance of MLMs on downstream tasks. Based on these findings, we propose a novel enhanced-context MLM, ExLM. Our approach expands $\texttt{[MASK]}$ tokens in the input context and models the dependencies between these expanded states. This enhancement increases context capacity and enables the model to capture richer semantic information, effectively mitigating the corrupted semantics problem during pre-training. Experimental results demonstrate that ExLM achieves significant performance improvements in both text modeling and SMILES modeling tasks. Further analysis confirms that ExLM enriches semantic representations through context enhancement, and effectively reduces the semantic multimodality commonly observed in MLMs.

Cite this Paper

BibTeX

@InProceedings{pmlr-v267-zheng25q,
  title = 	 {{E}x{LM}: Rethinking the Impact of $\texttt{[MASK]}$ Tokens in Masked Language Models},
  author =       {Zheng, Kangjie and Yang, Junwei and Liang, Siyue and Feng, Bin and Liu, Zequn and Ju, Wei and Xiao, Zhiping and Zhang, Ming},
  booktitle = 	 {Proceedings of the 42nd International Conference on Machine Learning},
  pages = 	 {78405--78433},
  year = 	 {2025},
  editor = 	 {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry},
  volume = 	 {267},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {13--19 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v267/main/assets/zheng25q/zheng25q.pdf},
  url = 	 {https://proceedings.mlr.press/v267/zheng25q.html},
  abstract = 	 {Masked Language Models (MLMs) have achieved remarkable success in many self-supervised representation learning tasks. MLMs are trained by randomly masking portions of the input sequences with $\texttt{[MASK]}$ tokens and learning to reconstruct the original content based on the remaining context. This paper explores the impact of $\texttt{[MASK]}$ tokens on MLMs. Analytical studies show that masking tokens can introduce the corrupted semantics problem, wherein the corrupted context may convey multiple, ambiguous meanings. This problem is also a key factor affecting the performance of MLMs on downstream tasks. Based on these findings, we propose a novel enhanced-context MLM, ExLM. Our approach expands $\texttt{[MASK]}$ tokens in the input context and models the dependencies between these expanded states. This enhancement increases context capacity and enables the model to capture richer semantic information, effectively mitigating the corrupted semantics problem during pre-training. Experimental results demonstrate that ExLM achieves significant performance improvements in both text modeling and SMILES modeling tasks. Further analysis confirms that ExLM enriches semantic representations through context enhancement, and effectively reduces the semantic multimodality commonly observed in MLMs.}
}

Endnote

%0 Conference Paper
%T ExLM: Rethinking the Impact of $\texttt[MASK]$ Tokens in Masked Language Models
%A Kangjie Zheng
%A Junwei Yang
%A Siyue Liang
%A Bin Feng
%A Zequn Liu
%A Wei Ju
%A Zhiping Xiao
%A Ming Zhang
%B Proceedings of the 42nd International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2025
%E Aarti Singh
%E Maryam Fazel
%E Daniel Hsu
%E Simon Lacoste-Julien
%E Felix Berkenkamp
%E Tegan Maharaj
%E Kiri Wagstaff
%E Jerry Zhu	
%F pmlr-v267-zheng25q
%I PMLR
%P 78405--78433
%U https://proceedings.mlr.press/v267/zheng25q.html
%V 267
%X Masked Language Models (MLMs) have achieved remarkable success in many self-supervised representation learning tasks. MLMs are trained by randomly masking portions of the input sequences with $\texttt{[MASK]}$ tokens and learning to reconstruct the original content based on the remaining context. This paper explores the impact of $\texttt{[MASK]}$ tokens on MLMs. Analytical studies show that masking tokens can introduce the corrupted semantics problem, wherein the corrupted context may convey multiple, ambiguous meanings. This problem is also a key factor affecting the performance of MLMs on downstream tasks. Based on these findings, we propose a novel enhanced-context MLM, ExLM. Our approach expands $\texttt{[MASK]}$ tokens in the input context and models the dependencies between these expanded states. This enhancement increases context capacity and enables the model to capture richer semantic information, effectively mitigating the corrupted semantics problem during pre-training. Experimental results demonstrate that ExLM achieves significant performance improvements in both text modeling and SMILES modeling tasks. Further analysis confirms that ExLM enriches semantic representations through context enhancement, and effectively reduces the semantic multimodality commonly observed in MLMs.

APA

Zheng, K., Yang, J., Liang, S., Feng, B., Liu, Z., Ju, W., Xiao, Z. & Zhang, M.. (2025). ExLM: Rethinking the Impact of $\texttt[MASK]$ Tokens in Masked Language Models. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:78405-78433 Available from https://proceedings.mlr.press/v267/zheng25q.html.

ExLM: Rethinking the Impact of $\texttt[MASK]$ Tokens in Masked Language Models

Abstract

Cite this Paper

Related Material