A Tale of Two LLMs - Safety vs. Complexity
What we’re reading: In this hypersonic era of “I need a bot for that!”, we are fascinated by this blog detailing Dual LLM patterns that are resistant to prompt injection attacks. Think of this as a really close digital friend: one who you can trust, but never be 100% certain that secrets will be kept just between you two. The article by Simon Willison asks readers an interesting question: “What is a safe subset of an AI assistant we can build today, in the age of ever more prompt injection vulnerabilities?” We have thoughts on this at Protect AI, so stay tuned on that. For now, here’s our quick take on why this blog matters.
Relevance to ML Security: The blog discusses how prompt injection types of attacks (and, others) can be used to “trick” learning models into misclassifying data. One of the key remediation strategies Simon proposes is by implementing a Dual LLM Pattern. Simon details Dual LLMs as: “The Privileged LLM is the core of the AI assistant. It accepts input from trusted sources—primarily the user themselves—and acts on that input in various ways” Meanwhile, “The Quarantined LLM is used any time we need to work with untrusted content—content that might conceivably incorporate a prompt injection attack. It does not have access to tools, and is expected to have the potential to go rogue at any moment.” A really interesting approach, but one that also has challenges - which Simon details in a very objective manner.
For the ML Team: When ML builders begin to implement a Dual LLM Pattern for more secure LLM deployments, there are some key things to keep in mind. Chief among them: Chaining. This common technique of using the output of one LLM prompt as the input to another creates some dangers that lead to new and nasty exploits. Managing the interfaces between the two LLMs - Privileged and Quarantined - should be watched closely. This custom build method of stitching together the LLMs and rendering that to the end user can increase the complexity of a system, but also improve its security. Yet, if one isn’t very mindful during implementation, the ramifications can be deadly. Simon describes it as possibly “radioactive.” Scary!
For the SEC Team: Prompt injection attacks are cropping up as fast as LLMs are being deployed. (They may even be morphing faster than LLMs. Who knows?) There are a range of traditional attacks that can be utilized here, including social engineering which is, perhaps, the most bedeviling of all security issues. Using social engineering as an “old dog, new trick” method in LLMs and Dual LLM Patterns could trick users into clicking on malicious links or sending sensitive data to an attacker. Simon says, “Social engineering is all about convincing language. Producing convincing language is the core competency of any LLM, especially when prompted by someone malicious who knows how to best direct them.” #100!
Our thoughts: To secure LLM applications, we need new architectural approaches for ML stacks which dramatically increase complexity with many more assets added to the ML software supply chain. Managing fine-grain data access controls, asset attestation, and having a traceable activity ledger is crucial. ML system builders should prepare for this increase in complexity, users, and assets by considering a ML Bill of Materials, or MLBOM.
An MLBOM is an essential tool for ML development, enabling reproducibility, versioning, and accurate reconstructions of ML models. It manages vulnerabilities, maintains compatibility, and evaluates third-party models for security. MLBOMs also facilitate compliance with governance and regulatory requirements by documenting components and versions.
Protect AI builds these tools to enhance security through transparency, collaboration, and addressing ML system challenges. Request a meeting with Protect AI solution architects to learn more and improve visibility, security, and auditability in your ML environment with AI Radar.