Trompt: Towards a Better Deep Neural Network for Tabular Data

Kuan-Yu Chen, Ping-Han Chiang, Hsin-Rung Chou, Ting-Wei Chen, Tien-Hao Chang
Proceedings of the 40th International Conference on Machine Learning, PMLR 202:4392-4434, 2023.

Abstract

Tabular data is arguably one of the most commonly used data structures in various practical domains, including finance, healthcare and e-commerce. The inherent heterogeneity allows tabular data to store rich information. However, based on a recently published tabular benchmark, we can see deep neural networks still fall behind tree-based models on tabular datasets. In this paper, we propose Trompt–which stands for Tabular Prompt–a novel architecture inspired by prompt learning of language models. The essence of prompt learning is to adjust a large pre-trained model through a set of prompts outside the model without directly modifying the model. Based on this idea, Trompt separates the learning strategy of tabular data into two parts. The first part, analogous to pre-trained models, focus on learning the intrinsic information of a table. The second part, analogous to prompts, focus on learning the variations among samples. Trompt is evaluated with the benchmark mentioned above. The experimental results demonstrate that Trompt outperforms state-of-the-art deep neural networks and is comparable to tree-based models.

Cite this Paper


BibTeX
@InProceedings{pmlr-v202-chen23c, title = {Trompt: Towards a Better Deep Neural Network for Tabular Data}, author = {Chen, Kuan-Yu and Chiang, Ping-Han and Chou, Hsin-Rung and Chen, Ting-Wei and Chang, Tien-Hao}, booktitle = {Proceedings of the 40th International Conference on Machine Learning}, pages = {4392--4434}, year = {2023}, editor = {Krause, Andreas and Brunskill, Emma and Cho, Kyunghyun and Engelhardt, Barbara and Sabato, Sivan and Scarlett, Jonathan}, volume = {202}, series = {Proceedings of Machine Learning Research}, month = {23--29 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v202/chen23c/chen23c.pdf}, url = {https://proceedings.mlr.press/v202/chen23c.html}, abstract = {Tabular data is arguably one of the most commonly used data structures in various practical domains, including finance, healthcare and e-commerce. The inherent heterogeneity allows tabular data to store rich information. However, based on a recently published tabular benchmark, we can see deep neural networks still fall behind tree-based models on tabular datasets. In this paper, we propose Trompt–which stands for Tabular Prompt–a novel architecture inspired by prompt learning of language models. The essence of prompt learning is to adjust a large pre-trained model through a set of prompts outside the model without directly modifying the model. Based on this idea, Trompt separates the learning strategy of tabular data into two parts. The first part, analogous to pre-trained models, focus on learning the intrinsic information of a table. The second part, analogous to prompts, focus on learning the variations among samples. Trompt is evaluated with the benchmark mentioned above. The experimental results demonstrate that Trompt outperforms state-of-the-art deep neural networks and is comparable to tree-based models.} }
Endnote
%0 Conference Paper %T Trompt: Towards a Better Deep Neural Network for Tabular Data %A Kuan-Yu Chen %A Ping-Han Chiang %A Hsin-Rung Chou %A Ting-Wei Chen %A Tien-Hao Chang %B Proceedings of the 40th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2023 %E Andreas Krause %E Emma Brunskill %E Kyunghyun Cho %E Barbara Engelhardt %E Sivan Sabato %E Jonathan Scarlett %F pmlr-v202-chen23c %I PMLR %P 4392--4434 %U https://proceedings.mlr.press/v202/chen23c.html %V 202 %X Tabular data is arguably one of the most commonly used data structures in various practical domains, including finance, healthcare and e-commerce. The inherent heterogeneity allows tabular data to store rich information. However, based on a recently published tabular benchmark, we can see deep neural networks still fall behind tree-based models on tabular datasets. In this paper, we propose Trompt–which stands for Tabular Prompt–a novel architecture inspired by prompt learning of language models. The essence of prompt learning is to adjust a large pre-trained model through a set of prompts outside the model without directly modifying the model. Based on this idea, Trompt separates the learning strategy of tabular data into two parts. The first part, analogous to pre-trained models, focus on learning the intrinsic information of a table. The second part, analogous to prompts, focus on learning the variations among samples. Trompt is evaluated with the benchmark mentioned above. The experimental results demonstrate that Trompt outperforms state-of-the-art deep neural networks and is comparable to tree-based models.
APA
Chen, K., Chiang, P., Chou, H., Chen, T. & Chang, T.. (2023). Trompt: Towards a Better Deep Neural Network for Tabular Data. Proceedings of the 40th International Conference on Machine Learning, in Proceedings of Machine Learning Research 202:4392-4434 Available from https://proceedings.mlr.press/v202/chen23c.html.

Related Material