Context Aware Group Nearest Shrunken Centroids in Large-Scale Genomic Studies

Juemin Yang, Fang Han, Rafael Irizarry, Han Liu
Proceedings of the Seventeenth International Conference on Artificial Intelligence and Statistics, PMLR 33:1051-1059, 2014.

Abstract

Recent genomic studies have identified genes related to specific phenotypes. In addition to marginal association analysis for individual genes, analyzing gene pathways (functionally related sets of genes) may yield additional valuable insights. We have devised an approach to phenotype classification from gene expression profiling. Our method named “group Nearest Shrunken Centroids (gNSC)” is an enhancement of the Nearest Shrunken Centroids (NSC) which is a popular and scalable method to analyze big data. While fully utilizing the variable structure of gene pathways, gNSC shares comparable computational speed as NSC if the group size is small. Comparing with NSC, gNSC improves the power of classification by utilizing the gene pathway information. In practice, we investigate the performance of gNSC on one of the largest microarray datasets aggregated from the internet. We show the effectiveness of our method by comparing the misclassification rate of gNSC with that of NSC. Additionally, we present a novel application of NSC/gNSC on context analysis of association between pathways and certain medical words. Some newest biological findings are rediscovered.

Cite this Paper


BibTeX
@InProceedings{pmlr-v33-yang14b, title = {{Context Aware Group Nearest Shrunken Centroids in Large-Scale Genomic Studies}}, author = {Yang, Juemin and Han, Fang and Irizarry, Rafael and Liu, Han}, booktitle = {Proceedings of the Seventeenth International Conference on Artificial Intelligence and Statistics}, pages = {1051--1059}, year = {2014}, editor = {Kaski, Samuel and Corander, Jukka}, volume = {33}, series = {Proceedings of Machine Learning Research}, address = {Reykjavik, Iceland}, month = {22--25 Apr}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v33/yang14b.pdf}, url = {https://proceedings.mlr.press/v33/yang14b.html}, abstract = {Recent genomic studies have identified genes related to specific phenotypes. In addition to marginal association analysis for individual genes, analyzing gene pathways (functionally related sets of genes) may yield additional valuable insights. We have devised an approach to phenotype classification from gene expression profiling. Our method named “group Nearest Shrunken Centroids (gNSC)” is an enhancement of the Nearest Shrunken Centroids (NSC) which is a popular and scalable method to analyze big data. While fully utilizing the variable structure of gene pathways, gNSC shares comparable computational speed as NSC if the group size is small. Comparing with NSC, gNSC improves the power of classification by utilizing the gene pathway information. In practice, we investigate the performance of gNSC on one of the largest microarray datasets aggregated from the internet. We show the effectiveness of our method by comparing the misclassification rate of gNSC with that of NSC. Additionally, we present a novel application of NSC/gNSC on context analysis of association between pathways and certain medical words. Some newest biological findings are rediscovered.} }
Endnote
%0 Conference Paper %T Context Aware Group Nearest Shrunken Centroids in Large-Scale Genomic Studies %A Juemin Yang %A Fang Han %A Rafael Irizarry %A Han Liu %B Proceedings of the Seventeenth International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2014 %E Samuel Kaski %E Jukka Corander %F pmlr-v33-yang14b %I PMLR %P 1051--1059 %U https://proceedings.mlr.press/v33/yang14b.html %V 33 %X Recent genomic studies have identified genes related to specific phenotypes. In addition to marginal association analysis for individual genes, analyzing gene pathways (functionally related sets of genes) may yield additional valuable insights. We have devised an approach to phenotype classification from gene expression profiling. Our method named “group Nearest Shrunken Centroids (gNSC)” is an enhancement of the Nearest Shrunken Centroids (NSC) which is a popular and scalable method to analyze big data. While fully utilizing the variable structure of gene pathways, gNSC shares comparable computational speed as NSC if the group size is small. Comparing with NSC, gNSC improves the power of classification by utilizing the gene pathway information. In practice, we investigate the performance of gNSC on one of the largest microarray datasets aggregated from the internet. We show the effectiveness of our method by comparing the misclassification rate of gNSC with that of NSC. Additionally, we present a novel application of NSC/gNSC on context analysis of association between pathways and certain medical words. Some newest biological findings are rediscovered.
RIS
TY - CPAPER TI - Context Aware Group Nearest Shrunken Centroids in Large-Scale Genomic Studies AU - Juemin Yang AU - Fang Han AU - Rafael Irizarry AU - Han Liu BT - Proceedings of the Seventeenth International Conference on Artificial Intelligence and Statistics DA - 2014/04/02 ED - Samuel Kaski ED - Jukka Corander ID - pmlr-v33-yang14b PB - PMLR DP - Proceedings of Machine Learning Research VL - 33 SP - 1051 EP - 1059 L1 - http://proceedings.mlr.press/v33/yang14b.pdf UR - https://proceedings.mlr.press/v33/yang14b.html AB - Recent genomic studies have identified genes related to specific phenotypes. In addition to marginal association analysis for individual genes, analyzing gene pathways (functionally related sets of genes) may yield additional valuable insights. We have devised an approach to phenotype classification from gene expression profiling. Our method named “group Nearest Shrunken Centroids (gNSC)” is an enhancement of the Nearest Shrunken Centroids (NSC) which is a popular and scalable method to analyze big data. While fully utilizing the variable structure of gene pathways, gNSC shares comparable computational speed as NSC if the group size is small. Comparing with NSC, gNSC improves the power of classification by utilizing the gene pathway information. In practice, we investigate the performance of gNSC on one of the largest microarray datasets aggregated from the internet. We show the effectiveness of our method by comparing the misclassification rate of gNSC with that of NSC. Additionally, we present a novel application of NSC/gNSC on context analysis of association between pathways and certain medical words. Some newest biological findings are rediscovered. ER -
APA
Yang, J., Han, F., Irizarry, R. & Liu, H.. (2014). Context Aware Group Nearest Shrunken Centroids in Large-Scale Genomic Studies. Proceedings of the Seventeenth International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 33:1051-1059 Available from https://proceedings.mlr.press/v33/yang14b.html.

Related Material