Large Language Models Can Automatically Engineer Features for Few-Shot Tabular Learning

Sungwon Han, Jinsung Yoon, Sercan O Arik, Tomas Pfister
Proceedings of the 41st International Conference on Machine Learning, PMLR 235:17454-17479, 2024.

Abstract

Large Language Models (LLMs), with their remarkable ability to tackle challenging and unseen reasoning problems, hold immense potential for tabular learning, that is vital for many real-world applications. In this paper, we propose a novel in-context learning framework, FeatLLM, which employs LLMs as feature engineers to produce an input data set that is optimally suited for tabular predictions. The generated features are used to infer class likelihood with a simple downstream machine learning model, such as linear regression and yields high performance few-shot learning. The proposed FeatLLM framework only uses this simple predictive model with the discovered features at inference time. Compared to existing LLM-based approaches, FeatLLM eliminates the need to send queries to the LLM for each sample at inference time. Moreover, it merely requires API-level access to LLMs, and overcomes prompt size limitations. As demonstrated across numerous tabular datasets from a wide range of domains, FeatLLM generates high-quality rules, significantly (10% on average) outperforming alternatives such as TabLLM and STUNT.

Cite this Paper


BibTeX
@InProceedings{pmlr-v235-han24f, title = {Large Language Models Can Automatically Engineer Features for Few-Shot Tabular Learning}, author = {Han, Sungwon and Yoon, Jinsung and Arik, Sercan O and Pfister, Tomas}, booktitle = {Proceedings of the 41st International Conference on Machine Learning}, pages = {17454--17479}, year = {2024}, editor = {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix}, volume = {235}, series = {Proceedings of Machine Learning Research}, month = {21--27 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v235/main/assets/han24f/han24f.pdf}, url = {https://proceedings.mlr.press/v235/han24f.html}, abstract = {Large Language Models (LLMs), with their remarkable ability to tackle challenging and unseen reasoning problems, hold immense potential for tabular learning, that is vital for many real-world applications. In this paper, we propose a novel in-context learning framework, FeatLLM, which employs LLMs as feature engineers to produce an input data set that is optimally suited for tabular predictions. The generated features are used to infer class likelihood with a simple downstream machine learning model, such as linear regression and yields high performance few-shot learning. The proposed FeatLLM framework only uses this simple predictive model with the discovered features at inference time. Compared to existing LLM-based approaches, FeatLLM eliminates the need to send queries to the LLM for each sample at inference time. Moreover, it merely requires API-level access to LLMs, and overcomes prompt size limitations. As demonstrated across numerous tabular datasets from a wide range of domains, FeatLLM generates high-quality rules, significantly (10% on average) outperforming alternatives such as TabLLM and STUNT.} }
Endnote
%0 Conference Paper %T Large Language Models Can Automatically Engineer Features for Few-Shot Tabular Learning %A Sungwon Han %A Jinsung Yoon %A Sercan O Arik %A Tomas Pfister %B Proceedings of the 41st International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2024 %E Ruslan Salakhutdinov %E Zico Kolter %E Katherine Heller %E Adrian Weller %E Nuria Oliver %E Jonathan Scarlett %E Felix Berkenkamp %F pmlr-v235-han24f %I PMLR %P 17454--17479 %U https://proceedings.mlr.press/v235/han24f.html %V 235 %X Large Language Models (LLMs), with their remarkable ability to tackle challenging and unseen reasoning problems, hold immense potential for tabular learning, that is vital for many real-world applications. In this paper, we propose a novel in-context learning framework, FeatLLM, which employs LLMs as feature engineers to produce an input data set that is optimally suited for tabular predictions. The generated features are used to infer class likelihood with a simple downstream machine learning model, such as linear regression and yields high performance few-shot learning. The proposed FeatLLM framework only uses this simple predictive model with the discovered features at inference time. Compared to existing LLM-based approaches, FeatLLM eliminates the need to send queries to the LLM for each sample at inference time. Moreover, it merely requires API-level access to LLMs, and overcomes prompt size limitations. As demonstrated across numerous tabular datasets from a wide range of domains, FeatLLM generates high-quality rules, significantly (10% on average) outperforming alternatives such as TabLLM and STUNT.
APA
Han, S., Yoon, J., Arik, S.O. & Pfister, T.. (2024). Large Language Models Can Automatically Engineer Features for Few-Shot Tabular Learning. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:17454-17479 Available from https://proceedings.mlr.press/v235/han24f.html.

Related Material