Understanding Model Threats

This resource is designed to provide detailed information on various threat categories, helping you understand and mitigate potential risks in AI and machine learning systems.

Deserialization Threats

Backdoor Threats

Runtime Threats

PAIT-LITERT-300: LiteRT Model Contains Unknown Operators PAIT-LITERT-301: LiteRT Model Contains Unsafe Operator Execution at Model Run Time PAIT-LITERT-302: LiteRT Model Contains Suspicious Operator Execution at Model Run Time PAIT-LMAFL-300: Llamafile Can Execute Malicious Code During Inference PAIT-KERAS-301: Keras Model Custom Layer Detected at Model Run Time PAIT-TF-300: TensorFlow SavedModel Contains Unknown Operators PAIT-TF-301: TensorFlow SavedModel Contains Unsafe Operator Execution at Model Run Time PAIT-TF-302: TensorFlow SavedModel Contains Suspicious Operator Execution at Model Run Time PAIT-TMT-300: Transitive Model Threat Detected with A Suspicious Model Dependency PAIT-TMT-301: Transitive Model Threat Detected with Unsafe Model Dependency PAIT-TCHST-300: TorchScript Model Arbitrary Code Execution Detected at Model Load Time PAIT-TCHST-301: TorchScript Model Arbitrary Code Execution Suspected at Model Load Time

Runtime Threats

Like a deserialization threat, runtime threats occur when untrusted data or code is used to reconstruct objects, leading to potential exploitation. The specific difference occurs in how the malicious code is triggered to execute. With a basic deserialization threat, this happens at model load time. A runtime threat is triggered when the model is used for inference or any form of execution. In AI and machine learning systems, this can result in malicious actors injecting harmful code during the deserialization process, exploiting vulnerabilities to gain unauthorized access or manipulate your systems behavior. Understanding deserialization threats is crucial for securing data integrity and preventing unauthorized code execution in your AI models.

Overview

PAIT-TCHST-300

TorchScript Model Arbitrary Code Execution Detected at Model Load Time

Overview

TorchScript is PyTorch's intermediate representation format that allows models to be serialized and run independently of Python. The format saves models as directories containing both Python source files and Pickle files, creating multiple attack vectors for malicious code injection.

Models flagged for this threat meet the following criteria:

The model format is detected as TorchScript.
The model contains potentially malicious code in internal Python files or Pickle files which will run when the model is executed.

Pickle is the original serialization Python module used for serializing and deserializing Python objects to share between processes or other computers. While convenient, Pickle poses significant security risks when used with untrusted data, as it can execute arbitrary code during deserialization. This makes it vulnerable to remote code execution attacks if an attacker can control the serialized data.

In this case, running the model will execute the code, and whatever malicious instructions have been inserted into it.

Key Points:

TorchScript models should be treated as executable code packages
The TorchScript format saves model code, data, and instructions for inference in a directory, containing both Python files and Pickle files.
Both Python and Pickle files can contain malicious code that will run automatically when the model is executed.
Only use/load models from trusted sources.

Background Information

TorchScript Format

TorchScript uses a directory-based serialization format with the following base structure:

1model_directory/
2├── code/
3│   ├── __torch__.py
4│   └── __torch__/
5│       └── module_implementations.py
6├── data.pkl
7├── constants.pkl
8└── version
9

The format includes:

Python source files (.py) containing the model's computational graph and custom operations
Pickle files (.pkl) storing serialized tensors, parameters, and metadata
Version files for compatibility checking

This code-inclusive serialization extends portability but creates security vulnerabilities that attackers can exploit to distribute malicious code.

Impact

This attack can harm your organization in any of the following ways:

Sharing data scientist / company github credentials with the attacker.
Sharing data scientist / company cloud credentials, resulting in IP theft of data and other models, all the way to cloud take over attacks.
Altering the behavior of expected models; allowing fraud, model performance impacts, and adversarial by-passes.
Corruption of other models, by leveraging credentials to rewrite known model assets within your environments.

How The Attack Works

Best Practices

You should:

Only load and execute models from trusted sources.
Implement a vetting process for third-party models before use, including scanning.
Use sandboxing techniques when loading untrusted models.
Use model formats that don't allow arbitrary code execution, such as SafeTensors, which provides a safe alternative to traditional serialization formats

Remediation

If possible, use a different model format like SafeTensors in order to remove this type of code injection attack from impacting your work entirely.

If not possible, reach out to the model creator and alert them that the model has failed our scan. You can even link to the specific page on our Insights Database to provide our most up to date findings.

The model provider should also report what they did to correct this issue as part of their release notes.

Protect AI's security scanner detects threats in model files

With Protect AI's Guardian you can scan models for threats before ML developers download them for use, and apply policies based on your risk tolerance.

Learn more

Runtime Threats

Overview

PAIT-TCHST-300

TorchScript Model Arbitrary Code Execution Detected at Model Load Time

Overview

Key Points:

Background Information

TorchScript Format

Further reading:

Impact

How The Attack Works

Best Practices

Remediation

Protect AI's security scanner detects threats in model files