A Framework for Rapidly Developing and Deploying Protection Against Large Language Model Attacks

Adam Swanda, Amy Chang, Alexander Chen, Fraser Burch, Paul Kassianik, Konstantin Berlin
Proceedings of the 2025 Conference on Applied Machine Learning for Information Security, PMLR 299:200-221, 2025.

Abstract

The widespread adoption of {Large Language Models} ({LLMs}) has revolutionized {AI} deployment, enabling autonomous and semi-autonomous applications across industries through intuitive language interfaces and continuous improvements in model development. However, the attendant increase in autonomy and expansion of access permissions among {AI} applications also make these systems compelling targets for malicious attacks. Their inherent susceptibility to security flaws necessitates robust defenses, yet no known approaches can prevent zero-day or novel attacks against {LLMs}. This places {AI} protection systems in a category similar to established malware protection systems: rather than providing guaranteed immunity, they minimize risk through enhanced observability, multi-layered defense, and rapid threat response, supported by a threat intelligence function designed specifically for {AI}-related threats. Prior work on {LLM} protection has largely evaluated individual detection models rather than end-to-end systems designed for continuous, rapid adaptation to a changing threat landscape. To address this gap, we present a production-grade defense system rooted in established malware detection and threat intelligence practices. Our platform integrates three components: a threat intelligence system that turns emerging threats into protections; a data platform that aggregates and enriches information while providing observability, monitoring, and {ML} operations; and a release platform enabling safe, rapid detection updates without disrupting customer workflows. Together, these components deliver layered protection against evolving {LLM} threats while generating training data for continuous model improvement and deploying updates without interrupting production. We share these design patterns and practices to surface the often under-documented, practical aspects of {LLM} security and accelerate progress on operations-focused tooling.

Cite this Paper


BibTeX
@InProceedings{pmlr-v299-swanda25a, title = {A Framework for Rapidly Developing and Deploying Protection Against {Large Language Model} Attacks}, author = {Swanda, Adam and Chang, Amy and Chen, Alexander and Burch, Fraser and Kassianik, Paul and Berlin, Konstantin}, booktitle = {Proceedings of the 2025 Conference on Applied Machine Learning for Information Security}, pages = {200--221}, year = {2025}, editor = {Raff, Edward and Rudd, Ethan M.}, volume = {299}, series = {Proceedings of Machine Learning Research}, month = {22--24 Oct}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v299/main/assets/swanda25a/swanda25a.pdf}, url = {https://proceedings.mlr.press/v299/swanda25a.html}, abstract = {The widespread adoption of {Large Language Models} ({LLMs}) has revolutionized {AI} deployment, enabling autonomous and semi-autonomous applications across industries through intuitive language interfaces and continuous improvements in model development. However, the attendant increase in autonomy and expansion of access permissions among {AI} applications also make these systems compelling targets for malicious attacks. Their inherent susceptibility to security flaws necessitates robust defenses, yet no known approaches can prevent zero-day or novel attacks against {LLMs}. This places {AI} protection systems in a category similar to established malware protection systems: rather than providing guaranteed immunity, they minimize risk through enhanced observability, multi-layered defense, and rapid threat response, supported by a threat intelligence function designed specifically for {AI}-related threats. Prior work on {LLM} protection has largely evaluated individual detection models rather than end-to-end systems designed for continuous, rapid adaptation to a changing threat landscape. To address this gap, we present a production-grade defense system rooted in established malware detection and threat intelligence practices. Our platform integrates three components: a threat intelligence system that turns emerging threats into protections; a data platform that aggregates and enriches information while providing observability, monitoring, and {ML} operations; and a release platform enabling safe, rapid detection updates without disrupting customer workflows. Together, these components deliver layered protection against evolving {LLM} threats while generating training data for continuous model improvement and deploying updates without interrupting production. We share these design patterns and practices to surface the often under-documented, practical aspects of {LLM} security and accelerate progress on operations-focused tooling.} }
Endnote
%0 Conference Paper %T A Framework for Rapidly Developing and Deploying Protection Against Large Language Model Attacks %A Adam Swanda %A Amy Chang %A Alexander Chen %A Fraser Burch %A Paul Kassianik %A Konstantin Berlin %B Proceedings of the 2025 Conference on Applied Machine Learning for Information Security %C Proceedings of Machine Learning Research %D 2025 %E Edward Raff %E Ethan M. Rudd %F pmlr-v299-swanda25a %I PMLR %P 200--221 %U https://proceedings.mlr.press/v299/swanda25a.html %V 299 %X The widespread adoption of {Large Language Models} ({LLMs}) has revolutionized {AI} deployment, enabling autonomous and semi-autonomous applications across industries through intuitive language interfaces and continuous improvements in model development. However, the attendant increase in autonomy and expansion of access permissions among {AI} applications also make these systems compelling targets for malicious attacks. Their inherent susceptibility to security flaws necessitates robust defenses, yet no known approaches can prevent zero-day or novel attacks against {LLMs}. This places {AI} protection systems in a category similar to established malware protection systems: rather than providing guaranteed immunity, they minimize risk through enhanced observability, multi-layered defense, and rapid threat response, supported by a threat intelligence function designed specifically for {AI}-related threats. Prior work on {LLM} protection has largely evaluated individual detection models rather than end-to-end systems designed for continuous, rapid adaptation to a changing threat landscape. To address this gap, we present a production-grade defense system rooted in established malware detection and threat intelligence practices. Our platform integrates three components: a threat intelligence system that turns emerging threats into protections; a data platform that aggregates and enriches information while providing observability, monitoring, and {ML} operations; and a release platform enabling safe, rapid detection updates without disrupting customer workflows. Together, these components deliver layered protection against evolving {LLM} threats while generating training data for continuous model improvement and deploying updates without interrupting production. We share these design patterns and practices to surface the often under-documented, practical aspects of {LLM} security and accelerate progress on operations-focused tooling.
APA
Swanda, A., Chang, A., Chen, A., Burch, F., Kassianik, P. & Berlin, K.. (2025). A Framework for Rapidly Developing and Deploying Protection Against Large Language Model Attacks. Proceedings of the 2025 Conference on Applied Machine Learning for Information Security, in Proceedings of Machine Learning Research 299:200-221 Available from https://proceedings.mlr.press/v299/swanda25a.html.

Related Material