FlashTP: Fused, Sparsity-Aware Tensor Product for Machine Learning Interatomic Potentials

Seung Yul Lee, Hojoon Kim, Yutack Park, Dawoon Jeong, Seungwu Han, Yeonhong Park, Jae W. Lee
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:33143-33156, 2025.

Abstract

Machine Learning Interatomic Potentials (MLIPs) enable efficient molecular dynamics (MD) simulations with high accuracy. While equivariant MLIPs achieve state-of-the-art accuracy, they face significant computational bottlenecks centered around their Tensor-Product layer, which account for up to 75% of training time and cause substantial memory overhead. We present FlashTP, a highly optimized tensor-product library that addresses these inefficiencies through kernel fusion, sparse computation, and path-aggregated execution. FlashTP achieves up to 41.6$\times$ and 60.8$\times$ kernel speedups over e3nn and NVIDIA cuEquivariance, respectively. For SevenNet-l3i5, it delivers 4.2$\times$ and 3.5$\times$ speedup while reducing peak memory usage by 6.3$\times$ and 6.2$\times$ for inference and training, respectively. The code is available at https://github.com/SNU-ARC/flashTP.

Cite this Paper


BibTeX
@InProceedings{pmlr-v267-lee25l, title = {{F}lash{TP}: Fused, Sparsity-Aware Tensor Product for Machine Learning Interatomic Potentials}, author = {Lee, Seung Yul and Kim, Hojoon and Park, Yutack and Jeong, Dawoon and Han, Seungwu and Park, Yeonhong and Lee, Jae W.}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, pages = {33143--33156}, year = {2025}, editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry}, volume = {267}, series = {Proceedings of Machine Learning Research}, month = {13--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/lee25l/lee25l.pdf}, url = {https://proceedings.mlr.press/v267/lee25l.html}, abstract = {Machine Learning Interatomic Potentials (MLIPs) enable efficient molecular dynamics (MD) simulations with high accuracy. While equivariant MLIPs achieve state-of-the-art accuracy, they face significant computational bottlenecks centered around their Tensor-Product layer, which account for up to 75% of training time and cause substantial memory overhead. We present FlashTP, a highly optimized tensor-product library that addresses these inefficiencies through kernel fusion, sparse computation, and path-aggregated execution. FlashTP achieves up to 41.6$\times$ and 60.8$\times$ kernel speedups over e3nn and NVIDIA cuEquivariance, respectively. For SevenNet-l3i5, it delivers 4.2$\times$ and 3.5$\times$ speedup while reducing peak memory usage by 6.3$\times$ and 6.2$\times$ for inference and training, respectively. The code is available at https://github.com/SNU-ARC/flashTP.} }
Endnote
%0 Conference Paper %T FlashTP: Fused, Sparsity-Aware Tensor Product for Machine Learning Interatomic Potentials %A Seung Yul Lee %A Hojoon Kim %A Yutack Park %A Dawoon Jeong %A Seungwu Han %A Yeonhong Park %A Jae W. Lee %B Proceedings of the 42nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Aarti Singh %E Maryam Fazel %E Daniel Hsu %E Simon Lacoste-Julien %E Felix Berkenkamp %E Tegan Maharaj %E Kiri Wagstaff %E Jerry Zhu %F pmlr-v267-lee25l %I PMLR %P 33143--33156 %U https://proceedings.mlr.press/v267/lee25l.html %V 267 %X Machine Learning Interatomic Potentials (MLIPs) enable efficient molecular dynamics (MD) simulations with high accuracy. While equivariant MLIPs achieve state-of-the-art accuracy, they face significant computational bottlenecks centered around their Tensor-Product layer, which account for up to 75% of training time and cause substantial memory overhead. We present FlashTP, a highly optimized tensor-product library that addresses these inefficiencies through kernel fusion, sparse computation, and path-aggregated execution. FlashTP achieves up to 41.6$\times$ and 60.8$\times$ kernel speedups over e3nn and NVIDIA cuEquivariance, respectively. For SevenNet-l3i5, it delivers 4.2$\times$ and 3.5$\times$ speedup while reducing peak memory usage by 6.3$\times$ and 6.2$\times$ for inference and training, respectively. The code is available at https://github.com/SNU-ARC/flashTP.
APA
Lee, S.Y., Kim, H., Park, Y., Jeong, D., Han, S., Park, Y. & Lee, J.W.. (2025). FlashTP: Fused, Sparsity-Aware Tensor Product for Machine Learning Interatomic Potentials. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:33143-33156 Available from https://proceedings.mlr.press/v267/lee25l.html.

Related Material