Aligning Transformers with Weisfeiler-Leman

Luis Müller; Christopher Morris

Aligning Transformers with Weisfeiler-Leman

Luis Müller, Christopher Morris

Proceedings of the 41st International Conference on Machine Learning, PMLR 235:36654-36704, 2024.

Abstract

Graph neural network architectures aligned with the

$k$ -dimensional Weisfeiler–Leman (

$k$ -WL) hierarchy offer theoretically well-understood expressive power. However, these architectures often fail to deliver state-of-the-art predictive performance on real-world graphs, limiting their practical utility. While recent works aligning graph transformer architectures with the

$k$ -WL hierarchy have shown promising empirical results, employing transformers for higher orders of

$k$ remains challenging due to a prohibitive runtime and memory complexity of self-attention as well as impractical architectural assumptions, such as an infeasible number of attention heads. Here, we advance the alignment of transformers with the

$k$ -WL hierarchy, showing stronger expressivity results for each

$k$ , making them more feasible in practice. In addition, we develop a theoretical framework that allows the study of established positional encodings such as Laplacian PEs and SPE. We evaluate our transformers on the large-scale PCQM4Mv2 dataset, showing competitive predictive performance with the state-of-the-art and demonstrating strong downstream performance when fine-tuning them on small-scale molecular datasets.

Cite this Paper

BibTeX


@InProceedings{pmlr-v235-muller24c,
  title = 	 {Aligning Transformers with Weisfeiler-Leman},
  author =       {M\"{u}ller, Luis and Morris, Christopher},
  booktitle = 	 {Proceedings of the 41st International Conference on Machine Learning},
  pages = 	 {36654--36704},
  year = 	 {2024},
  editor = 	 {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix},
  volume = 	 {235},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {21--27 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v235/main/assets/muller24c/muller24c.pdf},
  url = 	 {https://proceedings.mlr.press/v235/muller24c.html},
  abstract = 	 {Graph neural network architectures aligned with the $k$-dimensional Weisfeiler–Leman ($k$-WL) hierarchy offer theoretically well-understood expressive power. However, these architectures often fail to deliver state-of-the-art predictive performance on real-world graphs, limiting their practical utility. While recent works aligning graph transformer architectures with the $k$-WL hierarchy have shown promising empirical results, employing transformers for higher orders of $k$ remains challenging due to a prohibitive runtime and memory complexity of self-attention as well as impractical architectural assumptions, such as an infeasible number of attention heads. Here, we advance the alignment of transformers with the $k$-WL hierarchy, showing stronger expressivity results for each $k$, making them more feasible in practice. In addition, we develop a theoretical framework that allows the study of established positional encodings such as Laplacian PEs and SPE. We evaluate our transformers on the large-scale PCQM4Mv2 dataset, showing competitive predictive performance with the state-of-the-art and demonstrating strong downstream performance when fine-tuning them on small-scale molecular datasets.}
}

Endnote

%0 Conference Paper
%T Aligning Transformers with Weisfeiler-Leman
%A Luis Müller
%A Christopher Morris
%B Proceedings of the 41st International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2024
%E Ruslan Salakhutdinov
%E Zico Kolter
%E Katherine Heller
%E Adrian Weller
%E Nuria Oliver
%E Jonathan Scarlett
%E Felix Berkenkamp	
%F pmlr-v235-muller24c
%I PMLR
%P 36654--36704
%U https://proceedings.mlr.press/v235/muller24c.html
%V 235
%X Graph neural network architectures aligned with the $k$-dimensional Weisfeiler–Leman ($k$-WL) hierarchy offer theoretically well-understood expressive power. However, these architectures often fail to deliver state-of-the-art predictive performance on real-world graphs, limiting their practical utility. While recent works aligning graph transformer architectures with the $k$-WL hierarchy have shown promising empirical results, employing transformers for higher orders of $k$ remains challenging due to a prohibitive runtime and memory complexity of self-attention as well as impractical architectural assumptions, such as an infeasible number of attention heads. Here, we advance the alignment of transformers with the $k$-WL hierarchy, showing stronger expressivity results for each $k$, making them more feasible in practice. In addition, we develop a theoretical framework that allows the study of established positional encodings such as Laplacian PEs and SPE. We evaluate our transformers on the large-scale PCQM4Mv2 dataset, showing competitive predictive performance with the state-of-the-art and demonstrating strong downstream performance when fine-tuning them on small-scale molecular datasets.

APA


Müller, L. & Morris, C.. (2024). Aligning Transformers with Weisfeiler-Leman. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:36654-36704 Available from https://proceedings.mlr.press/v235/muller24c.html.

Aligning Transformers with Weisfeiler-Leman

Abstract

Cite this Paper

Related Material