Efficiently predicting high resolution mass spectra with graph neural networks

Michael Murphy, Stefanie Jegelka, Ernest Fraenkel, Tobias Kind, David Healey, Thomas Butler
Proceedings of the 40th International Conference on Machine Learning, PMLR 202:25549-25562, 2023.

Abstract

Identifying a small molecule from its mass spectrum is the primary open problem in computational metabolomics. This is typically cast as information retrieval: an unknown spectrum is matched against spectra predicted computationally from a large database of chemical structures. However, current approaches to spectrum prediction model the output space in ways that force a tradeoff between capturing high resolution mass information and tractable learning. We resolve this tradeoff by casting spectrum prediction as a mapping from an input molecular graph to a probability distribution over chemical formulas. We further discover that a large corpus of mass spectra can be closely approximated using a fixed vocabulary constituting only 2% of all observed formulas. This enables efficient spectrum prediction using an architecture similar to graph classification - GrAFF-MS - achieving significantly lower prediction error and greater retrieval accuracy than previous approaches.

Cite this Paper


BibTeX
@InProceedings{pmlr-v202-murphy23a, title = {Efficiently predicting high resolution mass spectra with graph neural networks}, author = {Murphy, Michael and Jegelka, Stefanie and Fraenkel, Ernest and Kind, Tobias and Healey, David and Butler, Thomas}, booktitle = {Proceedings of the 40th International Conference on Machine Learning}, pages = {25549--25562}, year = {2023}, editor = {Krause, Andreas and Brunskill, Emma and Cho, Kyunghyun and Engelhardt, Barbara and Sabato, Sivan and Scarlett, Jonathan}, volume = {202}, series = {Proceedings of Machine Learning Research}, month = {23--29 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v202/murphy23a/murphy23a.pdf}, url = {https://proceedings.mlr.press/v202/murphy23a.html}, abstract = {Identifying a small molecule from its mass spectrum is the primary open problem in computational metabolomics. This is typically cast as information retrieval: an unknown spectrum is matched against spectra predicted computationally from a large database of chemical structures. However, current approaches to spectrum prediction model the output space in ways that force a tradeoff between capturing high resolution mass information and tractable learning. We resolve this tradeoff by casting spectrum prediction as a mapping from an input molecular graph to a probability distribution over chemical formulas. We further discover that a large corpus of mass spectra can be closely approximated using a fixed vocabulary constituting only 2% of all observed formulas. This enables efficient spectrum prediction using an architecture similar to graph classification - GrAFF-MS - achieving significantly lower prediction error and greater retrieval accuracy than previous approaches.} }
Endnote
%0 Conference Paper %T Efficiently predicting high resolution mass spectra with graph neural networks %A Michael Murphy %A Stefanie Jegelka %A Ernest Fraenkel %A Tobias Kind %A David Healey %A Thomas Butler %B Proceedings of the 40th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2023 %E Andreas Krause %E Emma Brunskill %E Kyunghyun Cho %E Barbara Engelhardt %E Sivan Sabato %E Jonathan Scarlett %F pmlr-v202-murphy23a %I PMLR %P 25549--25562 %U https://proceedings.mlr.press/v202/murphy23a.html %V 202 %X Identifying a small molecule from its mass spectrum is the primary open problem in computational metabolomics. This is typically cast as information retrieval: an unknown spectrum is matched against spectra predicted computationally from a large database of chemical structures. However, current approaches to spectrum prediction model the output space in ways that force a tradeoff between capturing high resolution mass information and tractable learning. We resolve this tradeoff by casting spectrum prediction as a mapping from an input molecular graph to a probability distribution over chemical formulas. We further discover that a large corpus of mass spectra can be closely approximated using a fixed vocabulary constituting only 2% of all observed formulas. This enables efficient spectrum prediction using an architecture similar to graph classification - GrAFF-MS - achieving significantly lower prediction error and greater retrieval accuracy than previous approaches.
APA
Murphy, M., Jegelka, S., Fraenkel, E., Kind, T., Healey, D. & Butler, T.. (2023). Efficiently predicting high resolution mass spectra with graph neural networks. Proceedings of the 40th International Conference on Machine Learning, in Proceedings of Machine Learning Research 202:25549-25562 Available from https://proceedings.mlr.press/v202/murphy23a.html.

Related Material