CoDi: Co-evolving Contrastive Diffusion Models for Mixed-type Tabular Synthesis

Chaejeong Lee, Jayoung Kim, Noseong Park
Proceedings of the 40th International Conference on Machine Learning, PMLR 202:18940-18956, 2023.

Abstract

With growing attention to tabular data these days, the attempt to apply a synthetic table to various tasks has been expanded toward various scenarios. Owing to the recent advances in generative modeling, fake data generated by tabular data synthesis models become sophisticated and realistic. However, there still exists a difficulty in modeling discrete variables (columns) of tabular data. In this work, we propose to process continuous and discrete variables separately (but being conditioned on each other) by two diffusion models. The two diffusion models are co-evolved during training by reading conditions from each other. In order to further bind the diffusion models, moreover, we introduce a contrastive learning method with a negative sampling method. In our experiments with 11 real-world tabular datasets and 8 baseline methods, we prove the efficacy of the proposed method, called $\texttt{CoDi}$. Our code is available at https://github.com/ChaejeongLee/CoDi.

Cite this Paper


BibTeX
@InProceedings{pmlr-v202-lee23i, title = {{C}o{D}i: Co-evolving Contrastive Diffusion Models for Mixed-type Tabular Synthesis}, author = {Lee, Chaejeong and Kim, Jayoung and Park, Noseong}, booktitle = {Proceedings of the 40th International Conference on Machine Learning}, pages = {18940--18956}, year = {2023}, editor = {Krause, Andreas and Brunskill, Emma and Cho, Kyunghyun and Engelhardt, Barbara and Sabato, Sivan and Scarlett, Jonathan}, volume = {202}, series = {Proceedings of Machine Learning Research}, month = {23--29 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v202/lee23i/lee23i.pdf}, url = {https://proceedings.mlr.press/v202/lee23i.html}, abstract = {With growing attention to tabular data these days, the attempt to apply a synthetic table to various tasks has been expanded toward various scenarios. Owing to the recent advances in generative modeling, fake data generated by tabular data synthesis models become sophisticated and realistic. However, there still exists a difficulty in modeling discrete variables (columns) of tabular data. In this work, we propose to process continuous and discrete variables separately (but being conditioned on each other) by two diffusion models. The two diffusion models are co-evolved during training by reading conditions from each other. In order to further bind the diffusion models, moreover, we introduce a contrastive learning method with a negative sampling method. In our experiments with 11 real-world tabular datasets and 8 baseline methods, we prove the efficacy of the proposed method, called $\texttt{CoDi}$. Our code is available at https://github.com/ChaejeongLee/CoDi.} }
Endnote
%0 Conference Paper %T CoDi: Co-evolving Contrastive Diffusion Models for Mixed-type Tabular Synthesis %A Chaejeong Lee %A Jayoung Kim %A Noseong Park %B Proceedings of the 40th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2023 %E Andreas Krause %E Emma Brunskill %E Kyunghyun Cho %E Barbara Engelhardt %E Sivan Sabato %E Jonathan Scarlett %F pmlr-v202-lee23i %I PMLR %P 18940--18956 %U https://proceedings.mlr.press/v202/lee23i.html %V 202 %X With growing attention to tabular data these days, the attempt to apply a synthetic table to various tasks has been expanded toward various scenarios. Owing to the recent advances in generative modeling, fake data generated by tabular data synthesis models become sophisticated and realistic. However, there still exists a difficulty in modeling discrete variables (columns) of tabular data. In this work, we propose to process continuous and discrete variables separately (but being conditioned on each other) by two diffusion models. The two diffusion models are co-evolved during training by reading conditions from each other. In order to further bind the diffusion models, moreover, we introduce a contrastive learning method with a negative sampling method. In our experiments with 11 real-world tabular datasets and 8 baseline methods, we prove the efficacy of the proposed method, called $\texttt{CoDi}$. Our code is available at https://github.com/ChaejeongLee/CoDi.
APA
Lee, C., Kim, J. & Park, N.. (2023). CoDi: Co-evolving Contrastive Diffusion Models for Mixed-type Tabular Synthesis. Proceedings of the 40th International Conference on Machine Learning, in Proceedings of Machine Learning Research 202:18940-18956 Available from https://proceedings.mlr.press/v202/lee23i.html.

Related Material