Understanding Model Threats
This resource is designed to provide detailed information on various threat categories, helping you understand and mitigate potential risks in AI and machine learning systems.
Deserialization Threats
Deserialization threats occur when untrusted data or code is used to reconstruct objects, leading to potential exploitation. In AI and machine learning systems, this can result in malicious actors injecting harmful code during the deserialization process, exploiting vulnerabilities to gain unauthorized access or manipulate your systems behavior. Understanding deserialization threats is crucial for securing data integrity and preventing unauthorized code execution in your AI models.
Overview
Machine learning models are traditionally developed using PyTorch or another framework, and then converted to GGUF. GGUF is a binary format optimized for quick loading and storing models. GGUF format is developed by llama.cpp team, and is efficient for inference purposes.
The chat template is often used with large language models for prompt formatting. A potential security risk arises when Jinja chat template is not rendered in a sandboxed environment, leading to possible arbitrary code execution.
When a Jinja template is rendered in a sandboxed environment, any security concerns found in the template would raise an exception. Hence rendering a Jinja template in sandboxed environment allows developers to ensure Jinja template is safe to be loaded for any downstream tasks.
If a model reportedly has this issue it means:
- The model is serialized using GGUF format.
- The model contains potentially malicious code in its Jinja template which will execute when the model is loaded.
Key Points
- GGUF models consists of tensors and a standardized set of metadata. Chat template can be added as part of GGUF model metadata.
- GGUF uses Jinja2 templating to format the prompt.
- Attackers can insert malicious code in Jinja template.
- Loading GGUF model which uses Jinja template will execute any code (malicious or otherwise)
- Only load models from trusted sources.
Jinja Template rendering in Sandbox Environment
Here is a sample code that we can run to ensure a GGUF model’s chat template is safe”
-
First we retrieve chat template string from GGUF models metadata using
get_chat_template_str()
:Python
1from gguf.gguf_reader import GGUFReader 2import jinja2.sandbox 3 4def get_chat_template_str(file_path): 5 reader = GGUFReader(file_path) 6 7 for key, field in reader.fields.items(): 8 if key == "tokenizer.chat_template": 9 value = field.parts[field.data[0]] 10 11 # Convert the list of integers to a string 12 chat_template_str = ''.join(chr(i) for i in value) 13 return chat_template_str 14 15gguf_model_path = "enter model path here" 16gguf_chat_template = get_chat_template_str(gguf_model_path) 17
-
Load the retrieved chat template in Jinjas’ sandboxed environment using
run_jinja_template()
:Python
1 2def run_jinja_template(chat_template: str) -> bool: 3 try: 4 sandboxed_env = jinja2.sandbox.SandboxedEnvironment() 5 template = sandboxed_env.from_string(chat_template) 6 template.render() 7 except jinja2.exceptions.SecurityError: 8 return True 9 except Exception: 10 pass 11 12 return False 13 14print(f"Testing GGUF model:\nFound security error in Jinja template? {run_jinja_template(gguf_chat_template)}") 15
-
The output of running the last cell will either return True or False indicating whether a security error was found in Jinja template.
Further reading:
Impact
An attacker could exploit a compromised template to:
- Access sensitive information (e.g., SSH keys, cloud credentials)
- Execute malicious code on your system
- Use the compromised system as a vector for broader attacks
Note: Malicious code execution using Jinja template can be achieved without impacting a models performance - the user may never know that the attack has happened or is ongoing
How The Attack Works:
Best Practices
You should:
- Only load and execute models from trusted sources
- Implement a vetting process for third-party models before use
- Use sandboxing techniques when loading untrusted models
- Regularly update GGUF and related libraries to benefit from security patches
Remediation
GGUF models often come with Jinja template. If possible load GGUF models using Jinja inside sandboxed environments.
If not possible, reach out to the model creator and alert them that the model has failed our scan. You can even link to the specific page on our Insights Database to provide our most up to date findings.
The model provider should also report what they did to correct this issue as part of their release notes.