A Lens into Interpretable Transformer Mistakes via Semantic Dependency

Ruo-Jing Dong, Yu Yao, Bo Han, Tongliang Liu
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:14260-14284, 2025.

Abstract

Semantic Dependency refers to the relationship between words in a sentence where the meaning of one word depends on another, which is important for natural language understanding. In this paper, we investigate the role of semantic dependencies in answering questions for transformer models, which is achieved by analyzing how token values shift in response to changes in semantics. Through extensive experiments on models including the BERT series, GPT, and LLaMA, we uncover the following key findings: 1). Most tokens primarily retain their original semantic information even as they propagate through multiple layers. 2). Models can encode truthful semantic dependencies in tokens in the final layer. 3). Mistakes in model answers often stem from specific tokens encoded with incorrect semantic dependencies. Furthermore, we found that addressing the incorrectness by directly adjusting parameters is challenging because the same parameters can encode both correct and incorrect semantic dependencies depending on the context. Our findings provide insights into the causes of incorrect information generation in transformers and help the future development of robust and reliable models.

Cite this Paper


BibTeX
@InProceedings{pmlr-v267-dong25n, title = {A Lens into Interpretable Transformer Mistakes via Semantic Dependency}, author = {Dong, Ruo-Jing and Yao, Yu and Han, Bo and Liu, Tongliang}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, pages = {14260--14284}, year = {2025}, editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry}, volume = {267}, series = {Proceedings of Machine Learning Research}, month = {13--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/dong25n/dong25n.pdf}, url = {https://proceedings.mlr.press/v267/dong25n.html}, abstract = {Semantic Dependency refers to the relationship between words in a sentence where the meaning of one word depends on another, which is important for natural language understanding. In this paper, we investigate the role of semantic dependencies in answering questions for transformer models, which is achieved by analyzing how token values shift in response to changes in semantics. Through extensive experiments on models including the BERT series, GPT, and LLaMA, we uncover the following key findings: 1). Most tokens primarily retain their original semantic information even as they propagate through multiple layers. 2). Models can encode truthful semantic dependencies in tokens in the final layer. 3). Mistakes in model answers often stem from specific tokens encoded with incorrect semantic dependencies. Furthermore, we found that addressing the incorrectness by directly adjusting parameters is challenging because the same parameters can encode both correct and incorrect semantic dependencies depending on the context. Our findings provide insights into the causes of incorrect information generation in transformers and help the future development of robust and reliable models.} }
Endnote
%0 Conference Paper %T A Lens into Interpretable Transformer Mistakes via Semantic Dependency %A Ruo-Jing Dong %A Yu Yao %A Bo Han %A Tongliang Liu %B Proceedings of the 42nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Aarti Singh %E Maryam Fazel %E Daniel Hsu %E Simon Lacoste-Julien %E Felix Berkenkamp %E Tegan Maharaj %E Kiri Wagstaff %E Jerry Zhu %F pmlr-v267-dong25n %I PMLR %P 14260--14284 %U https://proceedings.mlr.press/v267/dong25n.html %V 267 %X Semantic Dependency refers to the relationship between words in a sentence where the meaning of one word depends on another, which is important for natural language understanding. In this paper, we investigate the role of semantic dependencies in answering questions for transformer models, which is achieved by analyzing how token values shift in response to changes in semantics. Through extensive experiments on models including the BERT series, GPT, and LLaMA, we uncover the following key findings: 1). Most tokens primarily retain their original semantic information even as they propagate through multiple layers. 2). Models can encode truthful semantic dependencies in tokens in the final layer. 3). Mistakes in model answers often stem from specific tokens encoded with incorrect semantic dependencies. Furthermore, we found that addressing the incorrectness by directly adjusting parameters is challenging because the same parameters can encode both correct and incorrect semantic dependencies depending on the context. Our findings provide insights into the causes of incorrect information generation in transformers and help the future development of robust and reliable models.
APA
Dong, R., Yao, Y., Han, B. & Liu, T.. (2025). A Lens into Interpretable Transformer Mistakes via Semantic Dependency. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:14260-14284 Available from https://proceedings.mlr.press/v267/dong25n.html.

Related Material