Foundations of Bayesian Learning from Synthetic Data

Harrison Wilde, Jack Jewson, Sebastian Vollmer, Chris Holmes
Proceedings of The 24th International Conference on Artificial Intelligence and Statistics, PMLR 130:541-549, 2021.

Abstract

There is significant growth and interest in the use of synthetic data as an enabler for machine learning in environments where the release of real data is restricted due to privacy or availability constraints. Despite a large number of methods for synthetic data generation, there are comparatively few results on the statistical properties of models learnt on synthetic data, and fewer still for situations where a researcher wishes to augment real data with another party’s synthesised data. We use a Bayesian paradigm to characterise the updating of model parameters when learning in these settings, demonstrating that caution should be taken when applying conventional learning algorithms without appropriate consideration of the synthetic data generating process and learning task at hand. Recent results from general Bayesian updating support a novel and robust approach to Bayesian synthetic-learning founded on decision theory that outperforms standard approaches across repeated experiments on supervised learning and inference problems.

Cite this Paper


BibTeX
@InProceedings{pmlr-v130-wilde21a, title = { Foundations of Bayesian Learning from Synthetic Data }, author = {Wilde, Harrison and Jewson, Jack and Vollmer, Sebastian and Holmes, Chris}, booktitle = {Proceedings of The 24th International Conference on Artificial Intelligence and Statistics}, pages = {541--549}, year = {2021}, editor = {Banerjee, Arindam and Fukumizu, Kenji}, volume = {130}, series = {Proceedings of Machine Learning Research}, month = {13--15 Apr}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v130/wilde21a/wilde21a.pdf}, url = {http://proceedings.mlr.press/v130/wilde21a.html}, abstract = { There is significant growth and interest in the use of synthetic data as an enabler for machine learning in environments where the release of real data is restricted due to privacy or availability constraints. Despite a large number of methods for synthetic data generation, there are comparatively few results on the statistical properties of models learnt on synthetic data, and fewer still for situations where a researcher wishes to augment real data with another party’s synthesised data. We use a Bayesian paradigm to characterise the updating of model parameters when learning in these settings, demonstrating that caution should be taken when applying conventional learning algorithms without appropriate consideration of the synthetic data generating process and learning task at hand. Recent results from general Bayesian updating support a novel and robust approach to Bayesian synthetic-learning founded on decision theory that outperforms standard approaches across repeated experiments on supervised learning and inference problems. } }
Endnote
%0 Conference Paper %T Foundations of Bayesian Learning from Synthetic Data %A Harrison Wilde %A Jack Jewson %A Sebastian Vollmer %A Chris Holmes %B Proceedings of The 24th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2021 %E Arindam Banerjee %E Kenji Fukumizu %F pmlr-v130-wilde21a %I PMLR %P 541--549 %U http://proceedings.mlr.press/v130/wilde21a.html %V 130 %X There is significant growth and interest in the use of synthetic data as an enabler for machine learning in environments where the release of real data is restricted due to privacy or availability constraints. Despite a large number of methods for synthetic data generation, there are comparatively few results on the statistical properties of models learnt on synthetic data, and fewer still for situations where a researcher wishes to augment real data with another party’s synthesised data. We use a Bayesian paradigm to characterise the updating of model parameters when learning in these settings, demonstrating that caution should be taken when applying conventional learning algorithms without appropriate consideration of the synthetic data generating process and learning task at hand. Recent results from general Bayesian updating support a novel and robust approach to Bayesian synthetic-learning founded on decision theory that outperforms standard approaches across repeated experiments on supervised learning and inference problems.
APA
Wilde, H., Jewson, J., Vollmer, S. & Holmes, C.. (2021). Foundations of Bayesian Learning from Synthetic Data . Proceedings of The 24th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 130:541-549 Available from http://proceedings.mlr.press/v130/wilde21a.html.

Related Material