[edit]
A Framework for Rapidly Developing and Deploying Protection Against Large Language Model Attacks
Proceedings of the 2025 Conference on Applied Machine Learning for Information Security, PMLR 299:200-221, 2025.
Abstract
The widespread adoption of {Large Language Models} ({LLMs}) has revolutionized {AI} deployment, enabling autonomous and semi-autonomous applications across industries through intuitive language interfaces and continuous improvements in model development. However, the attendant increase in autonomy and expansion of access permissions among {AI} applications also make these systems compelling targets for malicious attacks. Their inherent susceptibility to security flaws necessitates robust defenses, yet no known approaches can prevent zero-day or novel attacks against {LLMs}. This places {AI} protection systems in a category similar to established malware protection systems: rather than providing guaranteed immunity, they minimize risk through enhanced observability, multi-layered defense, and rapid threat response, supported by a threat intelligence function designed specifically for {AI}-related threats. Prior work on {LLM} protection has largely evaluated individual detection models rather than end-to-end systems designed for continuous, rapid adaptation to a changing threat landscape. To address this gap, we present a production-grade defense system rooted in established malware detection and threat intelligence practices. Our platform integrates three components: a threat intelligence system that turns emerging threats into protections; a data platform that aggregates and enriches information while providing observability, monitoring, and {ML} operations; and a release platform enabling safe, rapid detection updates without disrupting customer workflows. Together, these components deliver layered protection against evolving {LLM} threats while generating training data for continuous model improvement and deploying updates without interrupting production. We share these design patterns and practices to surface the often under-documented, practical aspects of {LLM} security and accelerate progress on operations-focused tooling.