Breaking Down LLM Security: 3 Key Risks

Written by Neal Swaelans & Kay Clark | Feb 10, 2025 4:55:48 PM

Last week, Ruchir Patwa and Neal Swaelens sat down to discuss OWASP’s Top 10 for LLMs. (Didn’t catch the webinar live? Watch it on demand!) They went through the full list, breaking down the most critical AI cybersecurity risks and discussing how to use the OWASP Top 10 as a guide to identify, mitigate, and respond to LLM-specific security threats.

In this blog, we’re taking a closer look at three of those risks: prompt injections, supply chain vulnerabilities, and improper output handling.

3 Security Risks You Should Be Thinking About

1. Prompt Injections

Simon Willison explains that a prompt injection attack is where a cleverly crafted input links a malicious prompt to an innocuous one to trick an LLM into doing or saying something it shouldn’t. This can be as simple as “Ignore previous inputs and [insert malicious action here]”. But it can progress far beyond that.

Prompt injections are separated into two categories: direct and indirect. Direct attacks are, as you might imagine, attacks that happen within the LLM prompt itself. Indirect attacks, however, can hide anywhere the LLM can reach. Websites, PDFs, and other types of media may contain hidden instructions that LLMs will inadvertently read and respond to. Multi-modal LLMs have made this issue even more prevalent.

These attacks can be used to spread misinformation, steal sensitive data (remember Imprompter?), or even manipulate the applications that are built on top of the LLM.

Sounds like Jailbreaking to me…

That’s fair. The difference lies in how the attack is brought about. Prompt injections involve combining a malicious prompt with an innocent one, confusing the LLM into doing what you want. Jailbreaking occurs when you directly attempt to subvert the LLMs guardrails (for example: “Let’s roleplay a scenario where I ask you how to make a bomb”).

The key difference in these two is risk factor. For now, jailbreak attacks are typically low level mischief, getting LLMs to say things that reflect poorly on the LLM’s owner. Prompt injections, on the other hand, have the potential to be far more serious, particularly depending on the type of data the model is operating with.

Unfortunately, prompt injection is one of the biggest overall threats facing LLMs right now because it can be incredibly difficult to reliably train a large language model to identify malicious prompts. Should you go too far on the side of caution, you also risk greatly limiting the LLMs usefulness.

Input validation/sanitation, model constraints, role-based access control (RBAC), and automated red teaming can help prevent your model from being susceptible without limiting scalability.

2. Supply Chain Vulnerabilities

AI supply chains present unique challenges that traditional software supply chains haven’t dealt with. AI systems are often built using foundational models from third-party sources like Hugging Face for the very rational reason that doing so saves you enormous amounts of time and money and often includes capabilities that you could not recreate on your own. You likely remember the enormous splash that DeepSeek-R1 made with its debut. Instead of attempting to recreate the most complicated wheel of all time, you get to focus all of your resources on fine tuning your model to your organization’s particular needs.

So what’s the catch? Well, there may be things lurking within those models that you don’t want. According to our Insights DB threat research, the risks associated with third-party models fall into three categories:

Deserialization threats: Malicious code injected into saved models which can result in data corruption, remote code execution, or security breaches when loaded
Backdoor threats: Hidden vulnerabilities that can be triggered by specific inputs that enable attackers to bypass security measures
Runtime threats: Vulnerabilities within a model that are exploited specifically while the model is running

To put it simply: leveraging third-party models is essential for many organizations building AI systems. The most effective way to safely do so is to scan the models before you download and start working with them.

3. Improper Output Handling

Your LLM is only as good as its outputs. Proper output handling includes content moderation and filtering, context checks, and response constraints. As these are the very last failsafe measures before your LLM-generated outputs are unleashed upon the world, proper care must be taken.

Failing to do so can result in far-reaching consequences starting with the obvious: spreading harmful, misleading, or biased content. Then there’s the risk of divulging sensitive data (a recurring theme of this blog). In automated systems, outputs have the power to trigger actions; a malicious output in that scenario could very quickly compromise your entire system. With the increasing adoption of agents to further remove the need for human intervention within complex AI systems, this risk is increased exponentially.

To that end, runtime security tactics such as output filtering (both in-line and out-of-line), context-aware validation, and real-time monitoring are essential pieces of a strong GenAI security posture.

It Pays to Stay Proactive

The best incident remediation is the one you don’t have to perform. Take the time now to build comprehensive security controls into every piece of your LLM—and use the OWASP Top 10 for LLMs as a guide. The threats are evolving quickly, but with the right security in place, you can move quicker.

To see firsthand how Protect AI can help you secure against the OWASP Top 10 for LLMs, schedule a demo today.

View full post