[edit]
TabSDS: a Lightweight, Fully Non-Parametric, and Model Free Approach for Generating Synthetic Tabular Data
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:7131-7195, 2025.
Abstract
The development of deep generative models for tabular data is currently a very active research area in machine learning. These models, however, tend to be computationally heavy and require careful tuning of multiple model parameters. In this paper, we propose TabSDS - a lightweight, non-parametric, and model free alternative to tabular deep generative models which leverages rank and data shuffling transformations for generating synthetic data which closely approximates the joint probability distribution of the real data. We evaluate TabSDS against multiple baselines implemented in the Synthcity Python library across several datasets. TabSDS showed very competitive performance against all baselines (including TabDDPM - a strong baseline model for tabular data generation). Importantly, the execution time of TabSDS is orders of magnitude faster than the deep generative baselines, and also considerably faster than other computationally efficient baselines such as adversarial random forests.