-
-

Understanding Model Threats

This resource is designed to provide detailed information on various threat categories, helping you understand and mitigate potential risks in AI and machine learning systems.

Architectural Backdoors

An architectural backdoor in a model is typically a parallel path from model input to model output. The presence of a backdoor means that an attacker has modified the architecture of the model in such a way that they (attacker) can manipulate the models output. The backdoor path is usually not used/activated with normal/non-malicious inputs, i.e. the model continues to behave as expected despite the presence of the backdoor. This means that a model with an architectural backdoor will continue to behave as expected in the absence of a trigger in input. When a trigger is detected in the input, the backdoor path gets activated and results in unexpected/malicious output from the compromised model. In this sense, an attacker can control when a backdoored model would give unexpected/incorrect. Another important consideration is the stealth level of backdoor attacks. A model compromised with a backdoor would continue to behave as expected for normal inputs. This means a user/victim using a backdoored model would not know that they are using a compromised model till the models unexpected behavior manifests, if at all.

Overview

If a model has failed for this issue it means that:

  1. The model format is detected as TensorFlows’ (TF) SavedModel.
  2. The model contains an architectural backdoor that can compromise a ML model, i.e model will give unexpected outputs for certain inputs.

Another important consideration is the stealth level of backdoor attacks. A model compromised with a backdoor would continue to behave as expected for normal inputs. This means a user/victim using a backdoored model would not know that they are using a compromised model till the models unexpected behavior manifests, if at all.

TensorFlow SavedModel Format

The SavedModel format saves models’ architecture (such as layers) as a graph. The graph represents the computation and flow of data in terms of nodes (operators) and edges (flow). A model saved using SavedModel does not depend on the original model building code to run, i.e. SavedModel format is inclusive of all model building code as well as any trained parameters.

The SavedModel extends model portability since it does not require model building code. Though at the same time, attackers can exploit the code-inclusive serialization format of SavedModel to ship malicious code to users.

Please note that SavedModel format is deprecated with the introduction of Keras 3. The recommended format is .keras

Background Information

The main points are:

  1. TensorFlow models saved using SavedModel should be deemed as running “packaged code”.
  2. The SavedModel format saves model code and trained parameters in a graph data structure.
  3. The presence of an architectural backdoor means that an attacker has modified the architecture of the model in such a way that they (attacker) can manipulate the models output.
  4. Only use/load models from trusted sources.

Further reading:

  1. Tensorflow lazy execution using graphs
  2. SavedModel Format

Attack Flow

Impact

A model compromised with a backdoor can have significant consequences, particularly when deployed in critical systems or made available via API:

  1. Service disruption: When triggered, the backdoor could cause the model to generate outputs that crash downstream systems or create processing bottlenecks.
  2. Delayed detection: Sophisticated backdoors may operate intermittently or only under specific conditions, allowing them to cause damage over extended periods before discovery.

Remediation

Only use models that come from a trusted source.

Use model signatures to verify a model is not tampered with.

In the case of an architectural backdoor, visualizing a model with OSS such as netron would also help in identification of a backdoor.

Protect AI's security scanner detects threats in model files
With Protect AI's Guardian you can scan models for threats before ML developers download them for use, and apply policies based on your risk tolerance.
Learn more