On utility of gene set signatures in gene expression-based cancer class prediction


Minca Mramor, Marko Toplak, Gregor Leban, Tomaž Curk, Janez Demšar, Blaž Zupan ;
Proceedings of the third International Workshop on Machine Learning in Systems Biology, PMLR 8:55-64, 2009.


Machine learning methods that can use additional knowledge in their inference process are central to the development of integrative bioinformatics. Inclusion of background knowledge improves robustness, predictive accuracy and interpretability. Recently, a set of such techniques has been proposed that use information on gene sets for supervised data mining of class-labeled microarray data sets. We here present a new gene set-based supervised learning approach named \emphsetsig and systematically investigate the predictive accuracy of this and other gene set approaches compared to the standard inference model where only gene expression information is used. Our results indicate that \emphsetsig outperforms other gene set approaches, but contrary to earlier reports, transformation of gene expression data to the space of gene set signatures does not result in increased accuracy of predictive models when compared to those trained directly from original (not transformed) data.

Related Material