Position: Machine Learning Models Have a Supply Chain Problem

Sarah Meiklejohn, Hayden Blauzvern, Mihai Maruseac, Spencer Schrock, Laurent Simon, Ilia Shumailov
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:81774-81785, 2025.

Abstract

Powerful machine learning (ML) models are now readily available online, which creates exciting possibilities for users who lack the deep technical expertise or substantial computing resources needed to develop them. On the other hand, this type of open ecosystem comes with many risks. In this paper, we argue that the current ecosystem for open ML models contains significant supply-chain risks, some of which have been exploited already in real attacks. These include an attacker replacing a model with something malicious (e.g., malware), or a model being trained using a vulnerable version of a framework or on restricted or poisoned data. We then explore how Sigstore, a solution designed to bring transparency to open-source software supply chains, can be used to bring transparency to open ML models, in terms of enabling model publishers to sign their models and prove properties about the datasets they use.

Cite this Paper


BibTeX
@InProceedings{pmlr-v267-meiklejohn25a, title = {Position: Machine Learning Models Have a Supply Chain Problem}, author = {Meiklejohn, Sarah and Blauzvern, Hayden and Maruseac, Mihai and Schrock, Spencer and Simon, Laurent and Shumailov, Ilia}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, pages = {81774--81785}, year = {2025}, editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry}, volume = {267}, series = {Proceedings of Machine Learning Research}, month = {13--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/meiklejohn25a/meiklejohn25a.pdf}, url = {https://proceedings.mlr.press/v267/meiklejohn25a.html}, abstract = {Powerful machine learning (ML) models are now readily available online, which creates exciting possibilities for users who lack the deep technical expertise or substantial computing resources needed to develop them. On the other hand, this type of open ecosystem comes with many risks. In this paper, we argue that the current ecosystem for open ML models contains significant supply-chain risks, some of which have been exploited already in real attacks. These include an attacker replacing a model with something malicious (e.g., malware), or a model being trained using a vulnerable version of a framework or on restricted or poisoned data. We then explore how Sigstore, a solution designed to bring transparency to open-source software supply chains, can be used to bring transparency to open ML models, in terms of enabling model publishers to sign their models and prove properties about the datasets they use.} }
Endnote
%0 Conference Paper %T Position: Machine Learning Models Have a Supply Chain Problem %A Sarah Meiklejohn %A Hayden Blauzvern %A Mihai Maruseac %A Spencer Schrock %A Laurent Simon %A Ilia Shumailov %B Proceedings of the 42nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Aarti Singh %E Maryam Fazel %E Daniel Hsu %E Simon Lacoste-Julien %E Felix Berkenkamp %E Tegan Maharaj %E Kiri Wagstaff %E Jerry Zhu %F pmlr-v267-meiklejohn25a %I PMLR %P 81774--81785 %U https://proceedings.mlr.press/v267/meiklejohn25a.html %V 267 %X Powerful machine learning (ML) models are now readily available online, which creates exciting possibilities for users who lack the deep technical expertise or substantial computing resources needed to develop them. On the other hand, this type of open ecosystem comes with many risks. In this paper, we argue that the current ecosystem for open ML models contains significant supply-chain risks, some of which have been exploited already in real attacks. These include an attacker replacing a model with something malicious (e.g., malware), or a model being trained using a vulnerable version of a framework or on restricted or poisoned data. We then explore how Sigstore, a solution designed to bring transparency to open-source software supply chains, can be used to bring transparency to open ML models, in terms of enabling model publishers to sign their models and prove properties about the datasets they use.
APA
Meiklejohn, S., Blauzvern, H., Maruseac, M., Schrock, S., Simon, L. & Shumailov, I.. (2025). Position: Machine Learning Models Have a Supply Chain Problem. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:81774-81785 Available from https://proceedings.mlr.press/v267/meiklejohn25a.html.

Related Material