XTab: Cross-table Pretraining for Tabular Transformers

Bingzhao Zhu, Xingjian Shi, Nick Erickson, Mu Li, George Karypis, Mahsa Shoaran
Proceedings of the 40th International Conference on Machine Learning, PMLR 202:43181-43204, 2023.

Abstract

The success of self-supervised learning in computer vision and natural language processing has motivated pretraining methods on tabular data. However, most existing tabular self-supervised learning models fail to leverage information across multiple data tables and cannot generalize to new tables. In this work, we introduce XTab, a framework for cross-table pretraining of tabular transformers on datasets from various domains. We address the challenge of inconsistent column types and quantities among tables by utilizing independent featurizers and using federated learning to pretrain the shared component. Tested on 84 tabular prediction tasks from the OpenML-AutoML Benchmark (AMLB), we show that (1) XTab consistently boosts the generalizability, learning speed, and performance of multiple tabular transformers, (2) by pretraining FT-Transformer via XTab, we achieve superior performance than other state-of-the-art tabular deep learning models on various tasks such as regression, binary, and multiclass classification.

Cite this Paper


BibTeX
@InProceedings{pmlr-v202-zhu23k, title = {{XT}ab: Cross-table Pretraining for Tabular Transformers}, author = {Zhu, Bingzhao and Shi, Xingjian and Erickson, Nick and Li, Mu and Karypis, George and Shoaran, Mahsa}, booktitle = {Proceedings of the 40th International Conference on Machine Learning}, pages = {43181--43204}, year = {2023}, editor = {Krause, Andreas and Brunskill, Emma and Cho, Kyunghyun and Engelhardt, Barbara and Sabato, Sivan and Scarlett, Jonathan}, volume = {202}, series = {Proceedings of Machine Learning Research}, month = {23--29 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v202/zhu23k/zhu23k.pdf}, url = {https://proceedings.mlr.press/v202/zhu23k.html}, abstract = {The success of self-supervised learning in computer vision and natural language processing has motivated pretraining methods on tabular data. However, most existing tabular self-supervised learning models fail to leverage information across multiple data tables and cannot generalize to new tables. In this work, we introduce XTab, a framework for cross-table pretraining of tabular transformers on datasets from various domains. We address the challenge of inconsistent column types and quantities among tables by utilizing independent featurizers and using federated learning to pretrain the shared component. Tested on 84 tabular prediction tasks from the OpenML-AutoML Benchmark (AMLB), we show that (1) XTab consistently boosts the generalizability, learning speed, and performance of multiple tabular transformers, (2) by pretraining FT-Transformer via XTab, we achieve superior performance than other state-of-the-art tabular deep learning models on various tasks such as regression, binary, and multiclass classification.} }
Endnote
%0 Conference Paper %T XTab: Cross-table Pretraining for Tabular Transformers %A Bingzhao Zhu %A Xingjian Shi %A Nick Erickson %A Mu Li %A George Karypis %A Mahsa Shoaran %B Proceedings of the 40th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2023 %E Andreas Krause %E Emma Brunskill %E Kyunghyun Cho %E Barbara Engelhardt %E Sivan Sabato %E Jonathan Scarlett %F pmlr-v202-zhu23k %I PMLR %P 43181--43204 %U https://proceedings.mlr.press/v202/zhu23k.html %V 202 %X The success of self-supervised learning in computer vision and natural language processing has motivated pretraining methods on tabular data. However, most existing tabular self-supervised learning models fail to leverage information across multiple data tables and cannot generalize to new tables. In this work, we introduce XTab, a framework for cross-table pretraining of tabular transformers on datasets from various domains. We address the challenge of inconsistent column types and quantities among tables by utilizing independent featurizers and using federated learning to pretrain the shared component. Tested on 84 tabular prediction tasks from the OpenML-AutoML Benchmark (AMLB), we show that (1) XTab consistently boosts the generalizability, learning speed, and performance of multiple tabular transformers, (2) by pretraining FT-Transformer via XTab, we achieve superior performance than other state-of-the-art tabular deep learning models on various tasks such as regression, binary, and multiclass classification.
APA
Zhu, B., Shi, X., Erickson, N., Li, M., Karypis, G. & Shoaran, M.. (2023). XTab: Cross-table Pretraining for Tabular Transformers. Proceedings of the 40th International Conference on Machine Learning, in Proceedings of Machine Learning Research 202:43181-43204 Available from https://proceedings.mlr.press/v202/zhu23k.html.

Related Material