Adapting Graph-Based Analysis for Knowledge Extraction from Transformer Models

Alexandre Monnier Weil; Vitor A. C. Horta; Hamza Qadeer; Alessandra Mileo

Adapting Graph-Based Analysis for Knowledge Extraction from Transformer Models

Alexandre Monnier Weil, Vitor A. C. Horta, Hamza Qadeer, Alessandra Mileo

Proceedings of The 19th International Conference on Neurosymbolic Learning and Reasoning, PMLR 284:1-14, 2025.

Abstract

Transformer models, despite their exceptional capabilities in Natural Language Processing (NLP) and Vision tasks, like deep neural network models, often function as "black boxes" as their internal processes remain largely opaque due to their complex architectures. This work extends graph-based knowledge extraction techniques, previously applied to CNNs, to the domain of Transformer models. The inner mechanics of Transformer models are explored by constructing a co-activation graph from their encoder layers. The nodes of the graph represent the hidden unit within each encoder layer, while the edges represent the statistical correlations between these hidden units. The magnitude of co-activation, which is the correlation between activations of two hidden units, determines the strength of their connection within the graph. Our research is focused on encoder-only Transformer classifiers. We conducted experiments involving a custom-built Transformer and a pre-trained BERT model for an NLP task. We used graph analysis to detect semantically related class clusters and their impact on misclassification patterns. We demonstrate a positive correlation between class similarity and the frequency of classification errors. Our findings suggest that co-activation graphs reveal structured, interpretable representations in Transformers, consistent with prior CNN findings on knowledge extraction.

Cite this Paper

BibTeX

@InProceedings{pmlr-v284-weil25a,
  title = 	 {Adapting Graph-Based Analysis for Knowledge Extraction from Transformer Models},
  author =       {Weil, Alexandre Monnier and Horta, Vitor A. C. and Qadeer, Hamza and Mileo, Alessandra},
  booktitle = 	 {Proceedings of The 19th International Conference on Neurosymbolic Learning and Reasoning},
  pages = 	 {1--14},
  year = 	 {2025},
  editor = 	 {H. Gilpin, Leilani and Giunchiglia, Eleonora and Hitzler, Pascal and van Krieken, Emile},
  volume = 	 {284},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {08--10 Sep},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v284/main/assets/weil25a/weil25a.pdf},
  url = 	 {https://proceedings.mlr.press/v284/weil25a.html},
  abstract = 	 {Transformer models, despite their exceptional capabilities in Natural Language Processing (NLP) and Vision tasks, like deep neural network models, often function as "black boxes" as their internal processes remain largely opaque due to their complex architectures. This work extends graph-based knowledge extraction techniques, previously applied to CNNs, to the domain of Transformer models. The inner mechanics of Transformer models are explored by constructing a co-activation graph from their encoder layers. The nodes of the graph represent the hidden unit within each encoder layer, while the edges represent the statistical correlations between these hidden units. The magnitude of co-activation, which is the correlation between activations of two hidden units, determines the strength of their connection within the graph. Our research is focused on encoder-only Transformer classifiers. We conducted experiments involving a custom-built Transformer and a pre-trained BERT model for an NLP task. We used graph analysis to detect semantically related class clusters and their impact on misclassification patterns. We demonstrate a positive correlation between class similarity and the frequency of classification errors.  Our findings suggest that co-activation graphs reveal structured, interpretable representations in Transformers, consistent with prior CNN findings on knowledge extraction.}
}

Endnote

%0 Conference Paper
%T Adapting Graph-Based Analysis for Knowledge Extraction from Transformer Models
%A Alexandre Monnier Weil
%A Vitor A. C. Horta
%A Hamza Qadeer
%A Alessandra Mileo
%B Proceedings of The 19th International Conference on Neurosymbolic Learning and Reasoning
%C Proceedings of Machine Learning Research
%D 2025
%E Leilani H. Gilpin
%E Eleonora Giunchiglia
%E Pascal Hitzler
%E Emile van Krieken	
%F pmlr-v284-weil25a
%I PMLR
%P 1--14
%U https://proceedings.mlr.press/v284/weil25a.html
%V 284
%X Transformer models, despite their exceptional capabilities in Natural Language Processing (NLP) and Vision tasks, like deep neural network models, often function as "black boxes" as their internal processes remain largely opaque due to their complex architectures. This work extends graph-based knowledge extraction techniques, previously applied to CNNs, to the domain of Transformer models. The inner mechanics of Transformer models are explored by constructing a co-activation graph from their encoder layers. The nodes of the graph represent the hidden unit within each encoder layer, while the edges represent the statistical correlations between these hidden units. The magnitude of co-activation, which is the correlation between activations of two hidden units, determines the strength of their connection within the graph. Our research is focused on encoder-only Transformer classifiers. We conducted experiments involving a custom-built Transformer and a pre-trained BERT model for an NLP task. We used graph analysis to detect semantically related class clusters and their impact on misclassification patterns. We demonstrate a positive correlation between class similarity and the frequency of classification errors.  Our findings suggest that co-activation graphs reveal structured, interpretable representations in Transformers, consistent with prior CNN findings on knowledge extraction.

APA

Weil, A.M., Horta, V.A.C., Qadeer, H. & Mileo, A.. (2025). Adapting Graph-Based Analysis for Knowledge Extraction from Transformer Models. Proceedings of The 19th International Conference on Neurosymbolic Learning and Reasoning, in Proceedings of Machine Learning Research 284:1-14 Available from https://proceedings.mlr.press/v284/weil25a.html.

Adapting Graph-Based Analysis for Knowledge Extraction from Transformer Models

Abstract

Cite this Paper

Related Material