Zero-shot Meta-learning for Tabular Prediction Tasks with Adversarially Pre-trained Transformer

Yulun Wu, Doron L Bergman
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:67111-67127, 2025.

Abstract

We present an Adversarially Pre-trained Transformer (APT) that is able to perform zero-shot meta-learning on tabular prediction tasks without using any real-world dataset to pre-train the model, extending on the recent development of Prior-Data Fitted Networks (PFNs) and TabPFN. Specifically, APT is pre-trained with adversarial synthetic data agents, who continue to shift their underlying data generating distribution and deliberately challenge the model with different synthetic datasets. In addition, we propose a mixture block model architecture that is able to handle classification tasks with arbitrary number of classes, addressing the class size limitation – a crucial weakness of prior tabular zero-shot learning algorithms. In experiments, we show that our framework matches state-of-the-art performance on small tabular classification tasks without filtering on dataset characteristics such as number of classes and number of missing values, while maintaining an average runtime under one second. On common benchmark dataset suites in both classification and regression, we show that adversarial pre-training was able to enhance TabPFN’s performance. In our analysis, we demonstrate that the adversarial synthetic data agents were able to generate a more diverse collection of data compared to the ordinary random generator in TabPFN. In addition, we demonstrate that our mixture block neural design has improved generalizability and greatly accelerated pre-training.

Cite this Paper


BibTeX
@InProceedings{pmlr-v267-wu25c, title = {Zero-shot Meta-learning for Tabular Prediction Tasks with Adversarially Pre-trained Transformer}, author = {Wu, Yulun and Bergman, Doron L}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, pages = {67111--67127}, year = {2025}, editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry}, volume = {267}, series = {Proceedings of Machine Learning Research}, month = {13--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/wu25c/wu25c.pdf}, url = {https://proceedings.mlr.press/v267/wu25c.html}, abstract = {We present an Adversarially Pre-trained Transformer (APT) that is able to perform zero-shot meta-learning on tabular prediction tasks without using any real-world dataset to pre-train the model, extending on the recent development of Prior-Data Fitted Networks (PFNs) and TabPFN. Specifically, APT is pre-trained with adversarial synthetic data agents, who continue to shift their underlying data generating distribution and deliberately challenge the model with different synthetic datasets. In addition, we propose a mixture block model architecture that is able to handle classification tasks with arbitrary number of classes, addressing the class size limitation – a crucial weakness of prior tabular zero-shot learning algorithms. In experiments, we show that our framework matches state-of-the-art performance on small tabular classification tasks without filtering on dataset characteristics such as number of classes and number of missing values, while maintaining an average runtime under one second. On common benchmark dataset suites in both classification and regression, we show that adversarial pre-training was able to enhance TabPFN’s performance. In our analysis, we demonstrate that the adversarial synthetic data agents were able to generate a more diverse collection of data compared to the ordinary random generator in TabPFN. In addition, we demonstrate that our mixture block neural design has improved generalizability and greatly accelerated pre-training.} }
Endnote
%0 Conference Paper %T Zero-shot Meta-learning for Tabular Prediction Tasks with Adversarially Pre-trained Transformer %A Yulun Wu %A Doron L Bergman %B Proceedings of the 42nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Aarti Singh %E Maryam Fazel %E Daniel Hsu %E Simon Lacoste-Julien %E Felix Berkenkamp %E Tegan Maharaj %E Kiri Wagstaff %E Jerry Zhu %F pmlr-v267-wu25c %I PMLR %P 67111--67127 %U https://proceedings.mlr.press/v267/wu25c.html %V 267 %X We present an Adversarially Pre-trained Transformer (APT) that is able to perform zero-shot meta-learning on tabular prediction tasks without using any real-world dataset to pre-train the model, extending on the recent development of Prior-Data Fitted Networks (PFNs) and TabPFN. Specifically, APT is pre-trained with adversarial synthetic data agents, who continue to shift their underlying data generating distribution and deliberately challenge the model with different synthetic datasets. In addition, we propose a mixture block model architecture that is able to handle classification tasks with arbitrary number of classes, addressing the class size limitation – a crucial weakness of prior tabular zero-shot learning algorithms. In experiments, we show that our framework matches state-of-the-art performance on small tabular classification tasks without filtering on dataset characteristics such as number of classes and number of missing values, while maintaining an average runtime under one second. On common benchmark dataset suites in both classification and regression, we show that adversarial pre-training was able to enhance TabPFN’s performance. In our analysis, we demonstrate that the adversarial synthetic data agents were able to generate a more diverse collection of data compared to the ordinary random generator in TabPFN. In addition, we demonstrate that our mixture block neural design has improved generalizability and greatly accelerated pre-training.
APA
Wu, Y. & Bergman, D.L.. (2025). Zero-shot Meta-learning for Tabular Prediction Tasks with Adversarially Pre-trained Transformer. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:67111-67127 Available from https://proceedings.mlr.press/v267/wu25c.html.

Related Material