Testing Exchangeability between Real and Synthetic Data

Helena Löfström, Lars Carlsson, Ernst Ahlberg
Proceedings of the Thirteenth Symposium on Conformal and Probabilistic Prediction with Applications, PMLR 230:424-431, 2024.

Abstract

This study introduces a method to evaluate synthetic data quality by focusing on the exchangeability of real and synthetic datasets. This is done through the use of a test martingale, which provides a statistical guarantee of the similarity of the synthetic data’s representation of the original data distribution. The method was tested on six real-world datasets and their synthetic counterparts, revealing that traditional metrics such as statistical similarities and model performance may be misleading. The results indicate that the martingale test frequently rejects the hypothesis of data exchangeability, underscore the need for more robust evaluation methods. The martingale-based evaluation offers a straightforward yet effective tool to ensure that synthetic data accurately reflects the original dataset, which is essential for effective model training and validation.

Cite this Paper


BibTeX
@InProceedings{pmlr-v230-lofstrom24b, title = {Testing Exchangeability between Real and Synthetic Data}, author = {L\"ofstr\"om, Helena and Carlsson, Lars and Ahlberg, Ernst}, booktitle = {Proceedings of the Thirteenth Symposium on Conformal and Probabilistic Prediction with Applications}, pages = {424--431}, year = {2024}, editor = {Vantini, Simone and Fontana, Matteo and Solari, Aldo and Boström, Henrik and Carlsson, Lars}, volume = {230}, series = {Proceedings of Machine Learning Research}, month = {09--11 Sep}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v230/main/assets/lofstrom24b/lofstrom24b.pdf}, url = {https://proceedings.mlr.press/v230/lofstrom24b.html}, abstract = {This study introduces a method to evaluate synthetic data quality by focusing on the exchangeability of real and synthetic datasets. This is done through the use of a test martingale, which provides a statistical guarantee of the similarity of the synthetic data’s representation of the original data distribution. The method was tested on six real-world datasets and their synthetic counterparts, revealing that traditional metrics such as statistical similarities and model performance may be misleading. The results indicate that the martingale test frequently rejects the hypothesis of data exchangeability, underscore the need for more robust evaluation methods. The martingale-based evaluation offers a straightforward yet effective tool to ensure that synthetic data accurately reflects the original dataset, which is essential for effective model training and validation.} }
Endnote
%0 Conference Paper %T Testing Exchangeability between Real and Synthetic Data %A Helena Löfström %A Lars Carlsson %A Ernst Ahlberg %B Proceedings of the Thirteenth Symposium on Conformal and Probabilistic Prediction with Applications %C Proceedings of Machine Learning Research %D 2024 %E Simone Vantini %E Matteo Fontana %E Aldo Solari %E Henrik Boström %E Lars Carlsson %F pmlr-v230-lofstrom24b %I PMLR %P 424--431 %U https://proceedings.mlr.press/v230/lofstrom24b.html %V 230 %X This study introduces a method to evaluate synthetic data quality by focusing on the exchangeability of real and synthetic datasets. This is done through the use of a test martingale, which provides a statistical guarantee of the similarity of the synthetic data’s representation of the original data distribution. The method was tested on six real-world datasets and their synthetic counterparts, revealing that traditional metrics such as statistical similarities and model performance may be misleading. The results indicate that the martingale test frequently rejects the hypothesis of data exchangeability, underscore the need for more robust evaluation methods. The martingale-based evaluation offers a straightforward yet effective tool to ensure that synthetic data accurately reflects the original dataset, which is essential for effective model training and validation.
APA
Löfström, H., Carlsson, L. & Ahlberg, E.. (2024). Testing Exchangeability between Real and Synthetic Data. Proceedings of the Thirteenth Symposium on Conformal and Probabilistic Prediction with Applications, in Proceedings of Machine Learning Research 230:424-431 Available from https://proceedings.mlr.press/v230/lofstrom24b.html.

Related Material