Understanding Model Threats

This resource is designed to provide detailed information on various threat categories, helping you understand and mitigate potential risks in AI and machine learning systems.

Deserialization Threats

Backdoor Threats

Runtime Threats

Deserialization Threats

Deserialization threats occur when untrusted data or code is used to reconstruct objects, leading to potential exploitation. In AI and machine learning systems, this can result in malicious actors injecting harmful code during the deserialization process, exploiting vulnerabilities to gain unauthorized access or manipulate your systems behavior. Understanding deserialization threats is crucial for securing data integrity and preventing unauthorized code execution in your AI models.

Overview

Deserialization threats in AI and machine learning systems pose significant security risks, particularly when using third party libraries such as Hydra to process executable directives in model configuration files. This article outlines the specific threat, its potential impact, and provides actionable steps to mitigate these risks.

Models flagged for this threat meet the following criteria:

Built using a model framework that uses YAML or JSON to store model metadata, such as NeMo or Safetensors
Incorporate Hydra targets within the metadata
Contain potentially suspicious code within these targets that executes upon model loading

Hydra is an open source framework which manages complex configurations, parsing a configuration object describing a target interface and instantiating it. It relies on execution of the directives given to it in its configuration, which can be any callable Python directives, and will pass in any arguments given to it.

Several popular ML frameworks, including NVIDIA’s NeMo and Apple’s ml-flextok which utilizes Safetensors, rely on Hydra for managing the loading of models from a metadata configuration. This carries a risk for arbitrary code execution if suspicious code is injected into the metadata, as Hydra will execute it.

Key Points

Deserialization threats can lead to unauthorized code execution in AI models.
NeMo models and others using Hydra directives to load model configurations are particularly vulnerable.
Attacks can result in exfiltration of sensitive file contents when combined with other methods.
Mitigation strategies include using thorough vetting processes and understanding any third-party dependencies

Impact

An attacker could exploit a compromised model to:

Access sensitive information (e.g., SSH keys, cloud credentials)
Access sensitive model information or dataset content (e.g., weights, CSV data)

The Hydra directives execute at model load time, potentially compromising your system before the model is even used.

How The Attack Works:

Best Practices

You should:

Only load and execute models from trusted sources
Implement a vetting process for third-party models before use
Use sandboxing techniques when loading untrusted models
Track dependencies of any third-party libraries used to load models

Remediation

If possible, reach out to the model creator and alert them that the model has failed our scan. You can even link to the specific page on our Insights Database to provide our most up to date findings.

The model provider should also report what they did to correct this issue as part of their release notes.

Protect AI's security scanner detects threats in model files

With Protect AI's Guardian you can scan models for threats before ML developers download them for use, and apply policies based on your risk tolerance.

Learn more