TabRepo: A Large Scale Repository of Tabular Model Evaluations and its AutoML Applications

David Salinas, Nick Erickson
Proceedings of the Third International Conference on Automated Machine Learning, PMLR 256:19/1-30, 2024.

Abstract

We introduce TabRepo, a new dataset of tabular model evaluations and predictions. TabRepo contains the predictions and metrics of 1206 models evaluated on 200 regression and classification datasets. We illustrate the benefit of our datasets in multiple ways. First, we show that it allows to perform analysis such as comparing Hyperparameter Optimization against current AutoML systems while also considering ensembling at no cost by using precomputed model predictions. Second, we show that our dataset can be readily leveraged to perform transfer-learning. In particular, we show that applying standard transfer-learning techniques allows to outperform current state-of-the-art tabular systems in accuracy, runtime and latency.

Cite this Paper


BibTeX
@InProceedings{pmlr-v256-salinas24a, title = {TabRepo: A Large Scale Repository of Tabular Model Evaluations and its AutoML Applications}, author = {Salinas, David and Erickson, Nick}, booktitle = {Proceedings of the Third International Conference on Automated Machine Learning}, pages = {19/1--30}, year = {2024}, editor = {Eggensperger, Katharina and Garnett, Roman and Vanschoren, Joaquin and Lindauer, Marius and Gardner, Jacob R.}, volume = {256}, series = {Proceedings of Machine Learning Research}, month = {09--12 Sep}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v256/main/assets/salinas24a/salinas24a.pdf}, url = {https://proceedings.mlr.press/v256/salinas24a.html}, abstract = {We introduce TabRepo, a new dataset of tabular model evaluations and predictions. TabRepo contains the predictions and metrics of 1206 models evaluated on 200 regression and classification datasets. We illustrate the benefit of our datasets in multiple ways. First, we show that it allows to perform analysis such as comparing Hyperparameter Optimization against current AutoML systems while also considering ensembling at no cost by using precomputed model predictions. Second, we show that our dataset can be readily leveraged to perform transfer-learning. In particular, we show that applying standard transfer-learning techniques allows to outperform current state-of-the-art tabular systems in accuracy, runtime and latency.} }
Endnote
%0 Conference Paper %T TabRepo: A Large Scale Repository of Tabular Model Evaluations and its AutoML Applications %A David Salinas %A Nick Erickson %B Proceedings of the Third International Conference on Automated Machine Learning %C Proceedings of Machine Learning Research %D 2024 %E Katharina Eggensperger %E Roman Garnett %E Joaquin Vanschoren %E Marius Lindauer %E Jacob R. Gardner %F pmlr-v256-salinas24a %I PMLR %P 19/1--30 %U https://proceedings.mlr.press/v256/salinas24a.html %V 256 %X We introduce TabRepo, a new dataset of tabular model evaluations and predictions. TabRepo contains the predictions and metrics of 1206 models evaluated on 200 regression and classification datasets. We illustrate the benefit of our datasets in multiple ways. First, we show that it allows to perform analysis such as comparing Hyperparameter Optimization against current AutoML systems while also considering ensembling at no cost by using precomputed model predictions. Second, we show that our dataset can be readily leveraged to perform transfer-learning. In particular, we show that applying standard transfer-learning techniques allows to outperform current state-of-the-art tabular systems in accuracy, runtime and latency.
APA
Salinas, D. & Erickson, N.. (2024). TabRepo: A Large Scale Repository of Tabular Model Evaluations and its AutoML Applications. Proceedings of the Third International Conference on Automated Machine Learning, in Proceedings of Machine Learning Research 256:19/1-30 Available from https://proceedings.mlr.press/v256/salinas24a.html.

Related Material