Statistically significant subgraphs for genome-wide association study

Jun Sese; Aika Terada; Yuki Saito; Koji Tsuda

Statistically significant subgraphs for genome-wide association study

Jun Sese, Aika Terada, Yuki Saito, Koji Tsuda

Proceedings of the Workshop on Statistically Sound Data Mining at ECML/PKDD, PMLR 47:29-36, 2015.

Abstract

Genome-wide association studies (GWAS) have been widely used for understanding the associations of single-nucleotide polymorphisms (SNPs) with a disease. GWAS data are often combined with known biological networks, and they have been analyzed using graph-mining techniques toward a systems understanding of the biological changes caused by the SNPs. To determine which subgraphs are associated with the disease, a statistical test on each subgraph needs to be conducted. However, no statistically significant results were found because multiple testing correction causes an extremely small corrected significance level. We introduce a method called gLAMP to enumerate subgraphs having statistically significant associations with a diagnosis. gLAMP integrates the Limitless Arity Multiple-testing Procedure (LAMP) with a graph-mining algorithm called COmmon Itemset Network mining (COIN). LAMP gives us the smallest possible Bonferroni factor, and COIN provides us with efficient enumeration of testable subgraphs. Theoretical results of their combination show the potential to enumerate subgraphs statistically significantly associated with a disease.

Cite this Paper

BibTeX


@InProceedings{pmlr-v47-sese14a,
  title = 	 {Statistically significant subgraphs for genome-wide association study},
  author = 	 {Sese, Jun and Terada, Aika and Saito, Yuki and Tsuda, Koji},
  booktitle = 	 {Proceedings of the Workshop on Statistically Sound Data Mining at ECML/PKDD},
  pages = 	 {29--36},
  year = 	 {2015},
  editor = 	 {Hämäläinen, Wilhelmiina and Petitjean, François and Webb, I.},
  volume = 	 {47},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {Nancy, France},
  month = 	 {15 Sep},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v47/sese14a.pdf},
  url = 	 {https://proceedings.mlr.press/v47/sese14a.html},
  abstract = 	 {Genome-wide association studies (GWAS) have been widely used for understanding the associations of single-nucleotide polymorphisms (SNPs) with a disease. GWAS data are often combined with known biological networks, and they have been analyzed using graph-mining techniques toward a systems understanding of the biological changes caused by the SNPs. To determine which subgraphs are associated with the disease, a statistical test on each subgraph needs to be conducted. However, no statistically significant results were found because multiple testing correction causes an extremely small corrected significance level. We introduce a method called gLAMP to enumerate subgraphs having statistically significant associations with a diagnosis. gLAMP integrates the Limitless Arity Multiple-testing Procedure (LAMP) with a graph-mining algorithm called COmmon Itemset Network mining (COIN). LAMP gives us the smallest possible Bonferroni factor, and COIN provides us with efficient enumeration of testable subgraphs. Theoretical results of their combination show the potential to enumerate subgraphs statistically significantly associated with a disease.}
}

Endnote

%0 Conference Paper
%T Statistically significant subgraphs for genome-wide association study
%A Jun Sese
%A Aika Terada
%A Yuki Saito
%A Koji Tsuda
%B Proceedings of the Workshop on Statistically Sound Data Mining at ECML/PKDD
%C Proceedings of Machine Learning Research
%D 2015
%E Wilhelmiina Hämäläinen
%E François Petitjean
%E I. Webb	
%F pmlr-v47-sese14a
%I PMLR
%P 29--36
%U https://proceedings.mlr.press/v47/sese14a.html
%V 47
%X Genome-wide association studies (GWAS) have been widely used for understanding the associations of single-nucleotide polymorphisms (SNPs) with a disease. GWAS data are often combined with known biological networks, and they have been analyzed using graph-mining techniques toward a systems understanding of the biological changes caused by the SNPs. To determine which subgraphs are associated with the disease, a statistical test on each subgraph needs to be conducted. However, no statistically significant results were found because multiple testing correction causes an extremely small corrected significance level. We introduce a method called gLAMP to enumerate subgraphs having statistically significant associations with a diagnosis. gLAMP integrates the Limitless Arity Multiple-testing Procedure (LAMP) with a graph-mining algorithm called COmmon Itemset Network mining (COIN). LAMP gives us the smallest possible Bonferroni factor, and COIN provides us with efficient enumeration of testable subgraphs. Theoretical results of their combination show the potential to enumerate subgraphs statistically significantly associated with a disease.

RIS


TY  - CPAPER
TI  - Statistically significant subgraphs for genome-wide association study
AU  - Jun Sese
AU  - Aika Terada
AU  - Yuki Saito
AU  - Koji Tsuda
BT  - Proceedings of the Workshop on Statistically Sound Data Mining at ECML/PKDD
DA  - 2015/11/27
ED  - Wilhelmiina Hämäläinen
ED  - François Petitjean
ED  - I. Webb	
ID  - pmlr-v47-sese14a
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 47
SP  - 29
EP  - 36
L1  - http://proceedings.mlr.press/v47/sese14a.pdf
UR  - https://proceedings.mlr.press/v47/sese14a.html
AB  - Genome-wide association studies (GWAS) have been widely used for understanding the associations of single-nucleotide polymorphisms (SNPs) with a disease. GWAS data are often combined with known biological networks, and they have been analyzed using graph-mining techniques toward a systems understanding of the biological changes caused by the SNPs. To determine which subgraphs are associated with the disease, a statistical test on each subgraph needs to be conducted. However, no statistically significant results were found because multiple testing correction causes an extremely small corrected significance level. We introduce a method called gLAMP to enumerate subgraphs having statistically significant associations with a diagnosis. gLAMP integrates the Limitless Arity Multiple-testing Procedure (LAMP) with a graph-mining algorithm called COmmon Itemset Network mining (COIN). LAMP gives us the smallest possible Bonferroni factor, and COIN provides us with efficient enumeration of testable subgraphs. Theoretical results of their combination show the potential to enumerate subgraphs statistically significantly associated with a disease.
ER  -

APA


Sese, J., Terada, A., Saito, Y. & Tsuda, K.. (2015). Statistically significant subgraphs for genome-wide association study. Proceedings of the Workshop on Statistically Sound Data Mining at ECML/PKDD, in Proceedings of Machine Learning Research 47:29-36 Available from https://proceedings.mlr.press/v47/sese14a.html.

Related Material

Download PDF