Constructing Simulation Data with Dependency Structure for Unreliable Single-Cell RNA-Sequencing Data Using Copulas

Cornelia Fuetterer; Georg Schollmeyer; Thomas Augustin

Constructing Simulation Data with Dependency Structure for Unreliable Single-Cell RNA-Sequencing Data Using Copulas

Cornelia Fuetterer, Georg Schollmeyer, Thomas Augustin

Proceedings of the Eleventh International Symposium on Imprecise Probabilities: Theories and Applications, PMLR 103:216-224, 2019.

Abstract

Simulation studies are becoming increasingly important for the evaluation of complex statistical methods. They tend to represent idealized situations. With our framework, which incorporates dependency structures using copulas, we propose multidimensional simulation data with marginals based on different degrees of heterogeneity, which are built on different ranges of distribution parameters of a zero-inflated negative binomial distribution. The obtained higher and lower variation of the simulation data allows to create lower and upper distribution functions lead to simulation data containing extreme points for each observation. Our approach aims at being closer to reality by considering data distortion. It is an approach of examining classification quality in case of measurement distortions in gene expression data and might propose specific instructions of calibrating measuring instruments.

Cite this Paper

BibTeX


@InProceedings{pmlr-v103-fuetterer19a,
  title = 	 {Constructing Simulation Data with Dependency Structure for Unreliable Single-Cell RNA-Sequencing Data Using Copulas},
  author =       {Fuetterer, Cornelia and Schollmeyer, Georg and Augustin, Thomas},
  booktitle = 	 {Proceedings of the Eleventh International Symposium on Imprecise Probabilities: Theories and Applications},
  pages = 	 {216--224},
  year = 	 {2019},
  editor = 	 {De Bock, Jasper and de Campos, Cassio P. and de Cooman, Gert and Quaeghebeur, Erik and Wheeler, Gregory},
  volume = 	 {103},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {03--06 Jul},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v103/fuetterer19a/fuetterer19a.pdf},
  url = 	 {https://proceedings.mlr.press/v103/fuetterer19a.html},
  abstract = 	 {Simulation studies are becoming increasingly important for the evaluation of complex statistical methods. They tend to represent idealized situations. With our framework, which incorporates dependency structures using copulas, we propose multidimensional simulation data with marginals based on different degrees of heterogeneity, which are built on different ranges of distribution parameters of a zero-inflated negative binomial distribution. The obtained higher and lower variation of the simulation data allows to create lower and upper distribution functions lead to simulation data containing extreme points for each observation. Our approach aims at being closer to reality by considering data distortion. It is an approach of examining classification quality in case of measurement distortions in gene expression data and might propose specific instructions of calibrating measuring instruments.}
}

Endnote

%0 Conference Paper
%T Constructing Simulation Data with Dependency Structure for Unreliable Single-Cell RNA-Sequencing Data Using Copulas
%A Cornelia Fuetterer
%A Georg Schollmeyer
%A Thomas Augustin
%B Proceedings of the Eleventh International Symposium on Imprecise Probabilities: Theories and Applications
%C Proceedings of Machine Learning Research
%D 2019
%E Jasper De Bock
%E Cassio P. de Campos
%E Gert de Cooman
%E Erik Quaeghebeur
%E Gregory Wheeler	
%F pmlr-v103-fuetterer19a
%I PMLR
%P 216--224
%U https://proceedings.mlr.press/v103/fuetterer19a.html
%V 103
%X Simulation studies are becoming increasingly important for the evaluation of complex statistical methods. They tend to represent idealized situations. With our framework, which incorporates dependency structures using copulas, we propose multidimensional simulation data with marginals based on different degrees of heterogeneity, which are built on different ranges of distribution parameters of a zero-inflated negative binomial distribution. The obtained higher and lower variation of the simulation data allows to create lower and upper distribution functions lead to simulation data containing extreme points for each observation. Our approach aims at being closer to reality by considering data distortion. It is an approach of examining classification quality in case of measurement distortions in gene expression data and might propose specific instructions of calibrating measuring instruments.

APA


Fuetterer, C., Schollmeyer, G. & Augustin, T.. (2019). Constructing Simulation Data with Dependency Structure for Unreliable Single-Cell RNA-Sequencing Data Using Copulas. Proceedings of the Eleventh International Symposium on Imprecise Probabilities: Theories and Applications, in Proceedings of Machine Learning Research 103:216-224 Available from https://proceedings.mlr.press/v103/fuetterer19a.html.

Related Material

Download PDF