Selecting deep neural networks that yield consistent attribution-based interpretations for genomics

Antonio Majdandzic, Chandana Rajesh, Ziqi Tang, Shushan Toneyan, Ethan L. Labelson, Rohit K. Tripathy, Peter K. Koo
Proceedings of the 17th Machine Learning in Computational Biology meeting, PMLR 200:131-149, 2022.

Abstract

Deep neural networks (DNNs) have advanced our ability to take DNA primary sequence as input and predict a myriad of molecular activities measured via high-throughput functional genomic assays. Post hoc attribution analysis has been employed to provide insights into the importance of features learned by DNNs, often revealing patterns such as sequence motifs. However, attribution maps typically harbor spurious importance scores to an extent that varies from model to model, even for DNNs whose predictions generalize well. Thus, the standard approach for model selection, which relies on performance of a held-out validation set, does not guarantee that a high-performing DNN will provide reliable explanations. Here we introduce two approaches that quantify the consistency of important features across a population of attribution maps; consistency reflects a qualitative property of human interpretable attribution maps. We employ the consistency metrics as part of a multivariate model selection framework to identify models that yield high generalization performance and interpretable attribution analysis. We demonstrate the efficacy of this approach across various DNNs quantitatively with synthetic data and qualitatively with chromatin accessibility data.

Cite this Paper


BibTeX
@InProceedings{pmlr-v200-majdandzic22a, title = {Selecting deep neural networks that yield consistent attribution-based interpretations for genomics}, author = {Majdandzic, Antonio and Rajesh, Chandana and Tang, Ziqi and Toneyan, Shushan and Labelson, Ethan L and Tripathy, Rohit K and Koo, Peter K.}, booktitle = {Proceedings of the 17th Machine Learning in Computational Biology meeting}, pages = {131--149}, year = {2022}, editor = {Knowles, David A and Mostafavi, Sara and Lee, Su-In}, volume = {200}, series = {Proceedings of Machine Learning Research}, month = {21--22 Nov}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v200/majdandzic22a/majdandzic22a.pdf}, url = {https://proceedings.mlr.press/v200/majdandzic22a.html}, abstract = {Deep neural networks (DNNs) have advanced our ability to take DNA primary sequence as input and predict a myriad of molecular activities measured via high-throughput functional genomic assays. Post hoc attribution analysis has been employed to provide insights into the importance of features learned by DNNs, often revealing patterns such as sequence motifs. However, attribution maps typically harbor spurious importance scores to an extent that varies from model to model, even for DNNs whose predictions generalize well. Thus, the standard approach for model selection, which relies on performance of a held-out validation set, does not guarantee that a high-performing DNN will provide reliable explanations. Here we introduce two approaches that quantify the consistency of important features across a population of attribution maps; consistency reflects a qualitative property of human interpretable attribution maps. We employ the consistency metrics as part of a multivariate model selection framework to identify models that yield high generalization performance and interpretable attribution analysis. We demonstrate the efficacy of this approach across various DNNs quantitatively with synthetic data and qualitatively with chromatin accessibility data.} }
Endnote
%0 Conference Paper %T Selecting deep neural networks that yield consistent attribution-based interpretations for genomics %A Antonio Majdandzic %A Chandana Rajesh %A Ziqi Tang %A Shushan Toneyan %A Ethan L. Labelson %A Rohit K. Tripathy %A Peter K. Koo %B Proceedings of the 17th Machine Learning in Computational Biology meeting %C Proceedings of Machine Learning Research %D 2022 %E David A Knowles %E Sara Mostafavi %E Su-In Lee %F pmlr-v200-majdandzic22a %I PMLR %P 131--149 %U https://proceedings.mlr.press/v200/majdandzic22a.html %V 200 %X Deep neural networks (DNNs) have advanced our ability to take DNA primary sequence as input and predict a myriad of molecular activities measured via high-throughput functional genomic assays. Post hoc attribution analysis has been employed to provide insights into the importance of features learned by DNNs, often revealing patterns such as sequence motifs. However, attribution maps typically harbor spurious importance scores to an extent that varies from model to model, even for DNNs whose predictions generalize well. Thus, the standard approach for model selection, which relies on performance of a held-out validation set, does not guarantee that a high-performing DNN will provide reliable explanations. Here we introduce two approaches that quantify the consistency of important features across a population of attribution maps; consistency reflects a qualitative property of human interpretable attribution maps. We employ the consistency metrics as part of a multivariate model selection framework to identify models that yield high generalization performance and interpretable attribution analysis. We demonstrate the efficacy of this approach across various DNNs quantitatively with synthetic data and qualitatively with chromatin accessibility data.
APA
Majdandzic, A., Rajesh, C., Tang, Z., Toneyan, S., Labelson, E.L., Tripathy, R.K. & Koo, P.K.. (2022). Selecting deep neural networks that yield consistent attribution-based interpretations for genomics. Proceedings of the 17th Machine Learning in Computational Biology meeting, in Proceedings of Machine Learning Research 200:131-149 Available from https://proceedings.mlr.press/v200/majdandzic22a.html.

Related Material