F-MoDA: Efficient Fourier-based motif discovery in attribution maps

Ofir Yaish, Yaron Orenstein
Proceedings of the 20th Machine Learning in Computational Biology meeting, PMLR 311:326-335, 2025.

Abstract

Deep neural networks have been transforming the field of bioinformatics and computational biology in recent years, especially in genomic-related tasks. Trained neural networks have enabled unprecedented capabilities in predicting molecular and genomic phenotypes. A fundamental step following deep-neural-network training and performance evaluation is the interpretation of the trained neural networks to learn new biology and validate the trained models. Techniques like integrated gradients have been highly successful in local interpretability, i.e., attributing importance to a specific residue in a given DNA, RNA, or amino acid sequence. But, there still remains the challenge of finding the global patterns that are shared among many sequences to understand the biological mechanism. Currently, TF-MoDISco is the only available method for this task. However, TF-MoDISco takes hours to run on standard datasets, and it reports many redundant and false motifs. Here, we present F-MoDA (Fourier-based Motif Discovery in Attribution maps), a novel computational method for efficiently and accurately discovering shared sequence motifs in residue-level attribution maps. F-MoDA leverages signal processing techniques and a hierarchical clustering approach to identify recurring regulatory patterns. We evaluated F-MoDA against TF-MoDISco over an established motif-finding benchmark and found that F-MoDA reports motifs that are more similar to the ground truth, in addition to reporting fewer redundant motifs and fewer false motifs. Moreover, F-MoDA runs much faster and uses less memory. We expect F-MoDA to be utilized in many studies applying deep neural networks to genomics data. F-MoDA is publicly available at https://github.com/OrensteinLab/F-MoDA.

Cite this Paper


BibTeX
@InProceedings{pmlr-v311-yaish25a, title = {F-MoDA: Efficient Fourier-based motif discovery in attribution maps}, author = {Yaish, Ofir and Orenstein, Yaron}, booktitle = {Proceedings of the 20th Machine Learning in Computational Biology meeting}, pages = {326--335}, year = {2025}, editor = {Knowles, David A and Koo, Peter K}, volume = {311}, series = {Proceedings of Machine Learning Research}, month = {10--11 Sep}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v311/main/assets/yaish25a/yaish25a.pdf}, url = {https://proceedings.mlr.press/v311/yaish25a.html}, abstract = {Deep neural networks have been transforming the field of bioinformatics and computational biology in recent years, especially in genomic-related tasks. Trained neural networks have enabled unprecedented capabilities in predicting molecular and genomic phenotypes. A fundamental step following deep-neural-network training and performance evaluation is the interpretation of the trained neural networks to learn new biology and validate the trained models. Techniques like integrated gradients have been highly successful in local interpretability, i.e., attributing importance to a specific residue in a given DNA, RNA, or amino acid sequence. But, there still remains the challenge of finding the global patterns that are shared among many sequences to understand the biological mechanism. Currently, TF-MoDISco is the only available method for this task. However, TF-MoDISco takes hours to run on standard datasets, and it reports many redundant and false motifs. Here, we present F-MoDA (Fourier-based Motif Discovery in Attribution maps), a novel computational method for efficiently and accurately discovering shared sequence motifs in residue-level attribution maps. F-MoDA leverages signal processing techniques and a hierarchical clustering approach to identify recurring regulatory patterns. We evaluated F-MoDA against TF-MoDISco over an established motif-finding benchmark and found that F-MoDA reports motifs that are more similar to the ground truth, in addition to reporting fewer redundant motifs and fewer false motifs. Moreover, F-MoDA runs much faster and uses less memory. We expect F-MoDA to be utilized in many studies applying deep neural networks to genomics data. F-MoDA is publicly available at https://github.com/OrensteinLab/F-MoDA.} }
Endnote
%0 Conference Paper %T F-MoDA: Efficient Fourier-based motif discovery in attribution maps %A Ofir Yaish %A Yaron Orenstein %B Proceedings of the 20th Machine Learning in Computational Biology meeting %C Proceedings of Machine Learning Research %D 2025 %E David A Knowles %E Peter K Koo %F pmlr-v311-yaish25a %I PMLR %P 326--335 %U https://proceedings.mlr.press/v311/yaish25a.html %V 311 %X Deep neural networks have been transforming the field of bioinformatics and computational biology in recent years, especially in genomic-related tasks. Trained neural networks have enabled unprecedented capabilities in predicting molecular and genomic phenotypes. A fundamental step following deep-neural-network training and performance evaluation is the interpretation of the trained neural networks to learn new biology and validate the trained models. Techniques like integrated gradients have been highly successful in local interpretability, i.e., attributing importance to a specific residue in a given DNA, RNA, or amino acid sequence. But, there still remains the challenge of finding the global patterns that are shared among many sequences to understand the biological mechanism. Currently, TF-MoDISco is the only available method for this task. However, TF-MoDISco takes hours to run on standard datasets, and it reports many redundant and false motifs. Here, we present F-MoDA (Fourier-based Motif Discovery in Attribution maps), a novel computational method for efficiently and accurately discovering shared sequence motifs in residue-level attribution maps. F-MoDA leverages signal processing techniques and a hierarchical clustering approach to identify recurring regulatory patterns. We evaluated F-MoDA against TF-MoDISco over an established motif-finding benchmark and found that F-MoDA reports motifs that are more similar to the ground truth, in addition to reporting fewer redundant motifs and fewer false motifs. Moreover, F-MoDA runs much faster and uses less memory. We expect F-MoDA to be utilized in many studies applying deep neural networks to genomics data. F-MoDA is publicly available at https://github.com/OrensteinLab/F-MoDA.
APA
Yaish, O. & Orenstein, Y.. (2025). F-MoDA: Efficient Fourier-based motif discovery in attribution maps. Proceedings of the 20th Machine Learning in Computational Biology meeting, in Proceedings of Machine Learning Research 311:326-335 Available from https://proceedings.mlr.press/v311/yaish25a.html.

Related Material