- title: 'Proceedings of the Third International Workshop on Machine Learning in Systems Biology: Revised Selected Papers' abstract: 'MLSB09, the Third International Workshop on Machine Learning in Systems Biology was held in Ljubljana, Slovenia on September 5-6 2009 at the Jožef Stefan Institute. This volume contains revised selected papers presented at the workshop. The technical program of the workshop consisted of 6 invited lectures, 12 oral presentations and 22 poster presentations. All the lectures were recorded and are available for viewing via the videolectures.net portal. More information on the workshop can be found at mlsb09.ijs.si' volume: 8 URL: https://proceedings.mlr.press/v8/dzeroski10a.html PDF: http://proceedings.mlr.press/v8/dzeroski10a/dzeroski10a.pdf edit: https://github.com/mlresearch//v8/edit/gh-pages/_posts/2009-03-02-dzeroski10a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the third International Workshop on Machine Learning in Systems Biology' publisher: 'PMLR' author: - given: Sašo family: Džeroski - given: Pierre family: Geurts - given: Juho family: Rousu editor: - given: Sašo family: Džeroski - given: Pierre family: Guerts - given: Juho family: Rousu address: Ljubljana, Slovenia page: 1-2 id: dzeroski10a issued: date-parts: - 2009 - 3 - 2 firstpage: 1 lastpage: 2 published: 2009-03-02 00:00:00 +0000 - title: 'A comparison of AUC estimators in small-sample studies' abstract: 'Reliable estimation of the classification performance of learned predictive models is difficult, when working in the small sample setting. When dealing with biological data it is often the case that separate test data cannot be afforded. Cross-validation is in this case a typical strategy for estimating the performance. Recent results, further supported by experimental evidence presented in this article, show that many standard approaches to cross-validation suffer from extensive bias or variance when the area under ROC curve (AUC) is used as performance measure. We advocate the use of leave-pair-out cross-validation (LPOCV) for performance estimation, as it avoids many of these problems. A method previously proposed by us can be used to efficiently calculate this estimate for regularized least-squares (RLS) based learners.' volume: 8 URL: https://proceedings.mlr.press/v8/airola10a.html PDF: http://proceedings.mlr.press/v8/airola10a/airola10a.pdf edit: https://github.com/mlresearch//v8/edit/gh-pages/_posts/2009-03-02-airola10a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the third International Workshop on Machine Learning in Systems Biology' publisher: 'PMLR' author: - given: Antti family: Airola - given: Tapio family: Pahikkala - given: Willem family: Waegeman - given: Bernard De family: Baets - given: Tapio family: Salakoski editor: - given: Sašo family: Džeroski - given: Pierre family: Guerts - given: Juho family: Rousu address: Ljubljana, Slovenia page: 3-13 id: airola10a issued: date-parts: - 2009 - 3 - 2 firstpage: 3 lastpage: 13 published: 2009-03-02 00:00:00 +0000 - title: 'Hierarchical Cost-Sensitive Algorithms for Genome-Wide Gene Function Prediction' abstract: 'In this work we propose new ensemble methods for the hierarchical classification of gene functions. Our methods exploit the hierarchical relationships between the classes in different ways: each ensemble node is trained “locally", according to its position in the hierarchy; moreover, in the evaluation phase the set of predicted annotations is built so to minimize a global loss function defined over the hierarchy. We also address the problem of sparsity of annotations by introducing a cost-sensitive parameter that allows to control the precision-recall trade-off. Experiments with the model organism S. cerevisiae, using the FunCat taxonomy and seven biomolecular data sets, reveal a significant advantage of our techniques over “flat” and cost-insensitive hierarchical ensembles.' volume: 8 URL: https://proceedings.mlr.press/v8/cesa-bianchi10a.html PDF: http://proceedings.mlr.press/v8/cesa-bianchi10a/cesa-bianchi10a.pdf edit: https://github.com/mlresearch//v8/edit/gh-pages/_posts/2009-03-02-cesa-bianchi10a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the third International Workshop on Machine Learning in Systems Biology' publisher: 'PMLR' author: - given: Nicolò family: Cesa-Bianchi - given: Giorgio family: Valentini editor: - given: Sašo family: Džeroski - given: Pierre family: Guerts - given: Juho family: Rousu address: Ljubljana, Slovenia page: 14-29 id: cesa-bianchi10a issued: date-parts: - 2009 - 3 - 2 firstpage: 14 lastpage: 29 published: 2009-03-02 00:00:00 +0000 - title: 'Evaluation of a Bayesian model-based approach in GA studies' abstract: 'In a typical Genetic Association Study (GAS) several hundreds to millions of genomic variables are measured and tested for association with a given set of a phenotypic variables (e.g., a given disease state or a complete expression profile), with the aim of identifying the genetic background of complex, multifactorial diseases. These highly varying requirements resulted in a number of different statistical tools applying different approaches either bayesian or non-bayesian, model-based or conditional. In this paper we evaluate dedicated GAS tools and general purpose feature subset selection (FSS) tools including a Bayesian model-based tool BMLA in a GAS context. In the evaluation we used an artificial data set generated from a reference model with 113 genotypic variables that was based on a real-world genotype data.' volume: 8 URL: https://proceedings.mlr.press/v8/hullam10a.html PDF: http://proceedings.mlr.press/v8/hullam10a/hullam10a.pdf edit: https://github.com/mlresearch//v8/edit/gh-pages/_posts/2009-03-02-hullam10a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the third International Workshop on Machine Learning in Systems Biology' publisher: 'PMLR' author: - given: Gábor family: Hullám - given: Péter family: Antal - given: Csaba family: Szalai - given: András family: Falus editor: - given: Sašo family: Džeroski - given: Pierre family: Guerts - given: Juho family: Rousu address: Ljubljana, Slovenia page: 30-43 id: hullam10a issued: date-parts: - 2009 - 3 - 2 firstpage: 30 lastpage: 43 published: 2009-03-02 00:00:00 +0000 - title: 'Evaluation of Signaling Cascades Based on the Weights from Microarray and ChIP-seq Data' abstract: 'In this study, we combined the ChIP-seq and the transcriptome data and integrated these data into signaling cascades. Integration was realized through a framework based on data- and model-driven hybrid approach. An enrichment model was constructed to evaluate signaling cascades which resulted in specific cellular processes. We used ChIP-seq and microarray data from public databases which were obtained from HeLa cells under oxidative stress having similar experimental setups. Both ChIP-seq and array data were analyzed by percentile ranking for the sake of simultaneous data integration on specific genes. Signaling cascades from KEGG pathway database were subsequently scored by taking sum of the individual scores of the genes involved within the cascade. This scoring information is then transferred to en route of the signaling cascade to form the final score. Signaling cascade model based framework that we describe in this study is a novel approach which calculates scores for the target process of the analyzed signaling cascade, rather than assigning scores to gene product nodes.' volume: 8 URL: https://proceedings.mlr.press/v8/isik10a.html PDF: http://proceedings.mlr.press/v8/isik10a/isik10a.pdf edit: https://github.com/mlresearch//v8/edit/gh-pages/_posts/2009-03-02-isik10a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the third International Workshop on Machine Learning in Systems Biology' publisher: 'PMLR' author: - given: Zerrin family: Isik - given: Volkan family: Atalay - given: Rengul family: Cetin-Atalay editor: - given: Sašo family: Džeroski - given: Pierre family: Guerts - given: Juho family: Rousu address: Ljubljana, Slovenia page: 44-54 id: isik10a issued: date-parts: - 2009 - 3 - 2 firstpage: 44 lastpage: 54 published: 2009-03-02 00:00:00 +0000 - title: 'On utility of gene set signatures in gene expression-based cancer class prediction' abstract: 'Machine learning methods that can use additional knowledge in their inference process are central to the development of integrative bioinformatics. Inclusion of background knowledge improves robustness, predictive accuracy and interpretability. Recently, a set of such techniques has been proposed that use information on gene sets for supervised data mining of class-labeled microarray data sets. We here present a new gene set-based supervised learning approach named SetSig and systematically investigate the predictive accuracy of this and other gene set approaches compared to the standard inference model where only gene expression information is used. Our results indicate that SetSig outperforms other gene set approaches, but contrary to earlier reports, transformation of gene expression data to the space of gene set signatures does not result in increased accuracy of predictive models when compared to those trained directly from original (not transformed) data.' volume: 8 URL: https://proceedings.mlr.press/v8/mramor10a.html PDF: http://proceedings.mlr.press/v8/mramor10a/mramor10a.pdf edit: https://github.com/mlresearch//v8/edit/gh-pages/_posts/2009-03-02-mramor10a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the third International Workshop on Machine Learning in Systems Biology' publisher: 'PMLR' author: - given: Minca family: Mramor - given: Marko family: Toplak - given: Gregor family: Leban - given: Tomaž family: Curk - given: Janez family: Demšar - given: Blaž family: Zupan editor: - given: Sašo family: Džeroski - given: Pierre family: Guerts - given: Juho family: Rousu address: Ljubljana, Slovenia page: 55-64 id: mramor10a issued: date-parts: - 2009 - 3 - 2 firstpage: 55 lastpage: 64 published: 2009-03-02 00:00:00 +0000 - title: 'Accuracy-Rejection Curves (ARCs) for Comparing Classification Methods with a Reject Option' abstract: 'Data extracted from microarrays are now considered an important source of knowledge about various diseases. Several studies based on microarray data and the use of receiver operating characteristics (ROC) graphs have compared supervised machine learning approaches. These comparisons are based on classification schemes in which all samples are classified, regardless of the degree of confidence associated with the classification of a particular sample on the basis of a given classifier. In the domain of healthcare, it is safer to refrain from classifying a sample if the confidence assigned to the classification is not high enough, rather than classifying all samples even if confidence is low. We describe an approach in which the performance of different classifiers is compared, with the possibility of rejection, based on several reject areas. Using a tradeoff between accuracy and rejection, we propose the use of accuracy-rejection curves (ARCs) and three types of relationship between ARCs for comparisons of the ARCs of two classifiers. Empirical results based on purely synthetic data, semi-synthetic data (generated from real data obtained from patients) and public microarray data for binary classification problems demonstrate the efficacy of this method.' volume: 8 URL: https://proceedings.mlr.press/v8/nadeem10a.html PDF: http://proceedings.mlr.press/v8/nadeem10a/nadeem10a.pdf edit: https://github.com/mlresearch//v8/edit/gh-pages/_posts/2009-03-02-nadeem10a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the third International Workshop on Machine Learning in Systems Biology' publisher: 'PMLR' author: - given: Malik Sajjad Ahmed family: Nadeem - given: Jean-Daniel family: Zucker - given: Blaise family: Hanczar editor: - given: Sašo family: Džeroski - given: Pierre family: Guerts - given: Juho family: Rousu address: Ljubljana, Slovenia page: 65-81 id: nadeem10a issued: date-parts: - 2009 - 3 - 2 firstpage: 65 lastpage: 81 published: 2009-03-02 00:00:00 +0000 - title: 'Predicting the functions of proteins in Protein-Protein Interaction networks from global information' abstract: 'In this work we present a novel approach to predict the function of proteins in protein-protein interaction (PPI) networks. We classify existing approaches into inductive and transductive approaches, and into local and global approaches. As of yet, among the group of inductive approaches, only local ones have been proposed for protein function prediction. We here introduce a protein description formalism that also includes global information, namely information that locates a protein relative to specific important proteins in the network. We analyze the effect on function prediction accuracy of selecting a different number of important proteins. With around 70 important proteins, even in large graphs, our method makes good and stable predictions. Furthermore, we investigate whether our method also classifies proteins accurately on more detailed function levels. We examined up to five different function levels. The method is benchmarked on four datasets where we found classification performance according to F-measure values indeed improves by 9 percent over the benchmark methods employed.' volume: 8 URL: https://proceedings.mlr.press/v8/rahmani10a.html PDF: http://proceedings.mlr.press/v8/rahmani10a/rahmani10a.pdf edit: https://github.com/mlresearch//v8/edit/gh-pages/_posts/2009-03-02-rahmani10a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the third International Workshop on Machine Learning in Systems Biology' publisher: 'PMLR' author: - given: Hossein family: Rahmani - given: Hendrik family: Blockeel - given: Andreas family: Bender editor: - given: Sašo family: Džeroski - given: Pierre family: Guerts - given: Juho family: Rousu address: Ljubljana, Slovenia page: 82-97 id: rahmani10a issued: date-parts: - 2009 - 3 - 2 firstpage: 82 lastpage: 97 published: 2009-03-02 00:00:00 +0000 - title: 'Simple ensemble methods are competitive with state-of-the-art data integration methods for gene function prediction' abstract: 'Several works showed that biomolecular data integration is a key issue to improve the prediction of gene functions. Quite surprisingly only little attention has been devoted to data integration for gene function prediction through ensemble methods. In this work we show that relatively simple ensemble methods are competitive and in some cases are also able to outperform state-of-the-art data integration techniques for gene function prediction.' volume: 8 URL: https://proceedings.mlr.press/v8/re10a.html PDF: http://proceedings.mlr.press/v8/re10a/re10a.pdf edit: https://github.com/mlresearch//v8/edit/gh-pages/_posts/2009-03-02-re10a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the third International Workshop on Machine Learning in Systems Biology' publisher: 'PMLR' author: - given: Matteo family: Ré - given: Giorgio family: Valentini editor: - given: Sašo family: Džeroski - given: Pierre family: Guerts - given: Juho family: Rousu address: Ljubljana, Slovenia page: 98-111 id: re10a issued: date-parts: - 2009 - 3 - 2 firstpage: 98 lastpage: 111 published: 2009-03-02 00:00:00 +0000 - title: 'Event based text mining for integrated network construction' abstract: 'The scientific literature is a rich and challenging data source for research in systems biology, providing numerous interactions between biological entities. Text mining techniques have been increasingly useful to extract such information from the literature in an automatic way, but up to now the main focus of text mining in the systems biology field has been restricted mostly to the discovery of protein-protein interactions. Here, we take this approach one step further, and use machine learning techniques combined with text mining to extract a much wider variety of interactions between biological entities. Each particular interaction type gives rise to a separate network, represented as a graph, all of which can be subsequently combined to yield a so-called integrated network representation. This provides a much broader view on the biological system as a whole, which can then be used in further investigations to analyse specific properties of the network.' volume: 8 URL: https://proceedings.mlr.press/v8/saeys10a.html PDF: http://proceedings.mlr.press/v8/saeys10a/saeys10a.pdf edit: https://github.com/mlresearch//v8/edit/gh-pages/_posts/2009-03-02-saeys10a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the third International Workshop on Machine Learning in Systems Biology' publisher: 'PMLR' author: - given: Yvan family: Saeys - given: Sofie Van family: Landeghem - given: Yves Van family: Peer editor: - given: Sašo family: Džeroski - given: Pierre family: Guerts - given: Juho family: Rousu address: Ljubljana, Slovenia page: 112-121 id: saeys10a issued: date-parts: - 2009 - 3 - 2 firstpage: 112 lastpage: 121 published: 2009-03-02 00:00:00 +0000 - title: 'Evaluation Method for Feature Rankings and their Aggregations for Biomarker Discovery' abstract: 'In this paper we investigate the problem of evaluating ranked lists of biomarkers, which are typically an output of the analysis of high-throughput data. This can be a list of probes from microarray experiments, which are ordered by the strength of their correlation to a disease. Usually, the ordering of the biomarkers in the ranked lists varies a lot if they are a result of different studies or methods. Our work consists of two parts. First, we propose a method for evaluating the “correctness” of the ranked lists. Second, we conduct a preliminary study of different aggregation approaches of the feature rankings, like aggregating rankings produced from different ranking algorithms and different datasets. We perform experiments on multiple public Neuroblastoma microarray studies. Our results show that there is a generally beneficial effect of aggregating feature rankings as compared to the ones produced by a single study or single method.' volume: 8 URL: https://proceedings.mlr.press/v8/slavkov10a.html PDF: http://proceedings.mlr.press/v8/slavkov10a/slavkov10a.pdf edit: https://github.com/mlresearch//v8/edit/gh-pages/_posts/2009-03-02-slavkov10a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the third International Workshop on Machine Learning in Systems Biology' publisher: 'PMLR' author: - given: Ivica family: Slavkov - given: Bernard family: Ženko - given: Sašo family: Džeroski editor: - given: Sašo family: Džeroski - given: Pierre family: Guerts - given: Juho family: Rousu address: Ljubljana, Slovenia page: 122-135 id: slavkov10a issued: date-parts: - 2009 - 3 - 2 firstpage: 122 lastpage: 135 published: 2009-03-02 00:00:00 +0000 - title: 'A Subgroup Discovery Approach for Relating Chemical Structure and Phenotype Data in Chemical Genomics' abstract: 'We report on development of an algorithm that can infer relations between the chemical structure and biochemical pathways from mutant-based growth fitness characterizations of small molecules. Identification of such relations is very important in drug discovery and development from the perspective of argument-based selection of candidate molecules in target-specific screenings, and early exclusion of substances with highly probable undesired side-effects. The algorithm uses a combination of unsupervised and supervised machine learning techniques, and besides experimental fitness data uses knowledge on gene subgroups (pathways), structural descriptions of chemicals, and MeSH term-based chemical and pharmacological annotations. We demonstrate the utility of the proposed approach in the analysis of a genome-wide S. cerevisiae chemogenomics assay by Hillenmeyer et al. (Science, 2008).' volume: 8 URL: https://proceedings.mlr.press/v8/umek10a.html PDF: http://proceedings.mlr.press/v8/umek10a/umek10a.pdf edit: https://github.com/mlresearch//v8/edit/gh-pages/_posts/2009-03-02-umek10a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the third International Workshop on Machine Learning in Systems Biology' publisher: 'PMLR' author: - given: Lan family: Umek - given: Petra family: Kaferle - given: Mojca family: Mattiazzi - given: Aleš family: Erjavec - given: Črtomir family: Gorup - given: Tomaž family: Curk - given: Uroš family: Petrovič - given: Blaž family: Zupan editor: - given: Sašo family: Džeroski - given: Pierre family: Guerts - given: Juho family: Rousu address: Ljubljana, Slovenia page: 136-144 id: umek10a issued: date-parts: - 2009 - 3 - 2 firstpage: 136 lastpage: 144 published: 2009-03-02 00:00:00 +0000