Position: Relational Deep Learning - Graph Representation Learning on Relational Databases

Matthias Fey, Weihua Hu, Kexin Huang, Jan Eric Lenssen, Rishabh Ranjan, Joshua Robinson, Rex Ying, Jiaxuan You, Jure Leskovec
Proceedings of the 41st International Conference on Machine Learning, PMLR 235:13592-13607, 2024.

Abstract

Much of the world’s most valued data is stored in relational databases and data warehouses, where the data is organized into tables connected by primary-foreign key relations. However, building machine learning models using this data is both challenging and time consuming because no ML algorithm can directly learn from multiple connected tables. Current approaches can only learn from a single table, so data must first be manually joined and aggregated into this format, the laborious process known as feature engineering. Feature engineering is slow, error prone and leads to suboptimal models. Here we introduce Relational Deep Learning (RDL), a blueprint for end-to-end learning on relational databases. The key is to represent relational databases as a temporal, heterogeneous graphs, with a node for each row in each table, and edges specified by primary-foreign key links. Graph Neural Networks then learn representations that leverage all input data, without any manual feature engineering. We also introduce RelBench, and benchmark and testing suite, demonstrating strong initial results. Overall, we define a new research area that generalizes graph machine learning and broadens its applicability.

Cite this Paper


BibTeX
@InProceedings{pmlr-v235-fey24a, title = {Position: Relational Deep Learning - Graph Representation Learning on Relational Databases}, author = {Fey, Matthias and Hu, Weihua and Huang, Kexin and Lenssen, Jan Eric and Ranjan, Rishabh and Robinson, Joshua and Ying, Rex and You, Jiaxuan and Leskovec, Jure}, booktitle = {Proceedings of the 41st International Conference on Machine Learning}, pages = {13592--13607}, year = {2024}, editor = {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix}, volume = {235}, series = {Proceedings of Machine Learning Research}, month = {21--27 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v235/main/assets/fey24a/fey24a.pdf}, url = {https://proceedings.mlr.press/v235/fey24a.html}, abstract = {Much of the world’s most valued data is stored in relational databases and data warehouses, where the data is organized into tables connected by primary-foreign key relations. However, building machine learning models using this data is both challenging and time consuming because no ML algorithm can directly learn from multiple connected tables. Current approaches can only learn from a single table, so data must first be manually joined and aggregated into this format, the laborious process known as feature engineering. Feature engineering is slow, error prone and leads to suboptimal models. Here we introduce Relational Deep Learning (RDL), a blueprint for end-to-end learning on relational databases. The key is to represent relational databases as a temporal, heterogeneous graphs, with a node for each row in each table, and edges specified by primary-foreign key links. Graph Neural Networks then learn representations that leverage all input data, without any manual feature engineering. We also introduce RelBench, and benchmark and testing suite, demonstrating strong initial results. Overall, we define a new research area that generalizes graph machine learning and broadens its applicability.} }
Endnote
%0 Conference Paper %T Position: Relational Deep Learning - Graph Representation Learning on Relational Databases %A Matthias Fey %A Weihua Hu %A Kexin Huang %A Jan Eric Lenssen %A Rishabh Ranjan %A Joshua Robinson %A Rex Ying %A Jiaxuan You %A Jure Leskovec %B Proceedings of the 41st International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2024 %E Ruslan Salakhutdinov %E Zico Kolter %E Katherine Heller %E Adrian Weller %E Nuria Oliver %E Jonathan Scarlett %E Felix Berkenkamp %F pmlr-v235-fey24a %I PMLR %P 13592--13607 %U https://proceedings.mlr.press/v235/fey24a.html %V 235 %X Much of the world’s most valued data is stored in relational databases and data warehouses, where the data is organized into tables connected by primary-foreign key relations. However, building machine learning models using this data is both challenging and time consuming because no ML algorithm can directly learn from multiple connected tables. Current approaches can only learn from a single table, so data must first be manually joined and aggregated into this format, the laborious process known as feature engineering. Feature engineering is slow, error prone and leads to suboptimal models. Here we introduce Relational Deep Learning (RDL), a blueprint for end-to-end learning on relational databases. The key is to represent relational databases as a temporal, heterogeneous graphs, with a node for each row in each table, and edges specified by primary-foreign key links. Graph Neural Networks then learn representations that leverage all input data, without any manual feature engineering. We also introduce RelBench, and benchmark and testing suite, demonstrating strong initial results. Overall, we define a new research area that generalizes graph machine learning and broadens its applicability.
APA
Fey, M., Hu, W., Huang, K., Lenssen, J.E., Ranjan, R., Robinson, J., Ying, R., You, J. & Leskovec, J.. (2024). Position: Relational Deep Learning - Graph Representation Learning on Relational Databases. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:13592-13607 Available from https://proceedings.mlr.press/v235/fey24a.html.

Related Material