-
-

Understanding Model Threats

This resource is designed to provide detailed information on various threat categories, helping you understand and mitigate potential risks in AI and machine learning systems.

Architectural Backdoors

An architectural backdoor in a model is typically a parallel path from model input to model output. The presence of a backdoor means that an attacker has modified the architecture of the model in such a way that they (attacker) can manipulate the models output. The backdoor path is usually not used/activated with normal/non-malicious inputs, i.e. the model continues to behave as expected despite the presence of the backdoor. This means that a model with an architectural backdoor will continue to behave as expected in the absence of a trigger in input. When a trigger is detected in the input, the backdoor path gets activated and results in unexpected/malicious output from the compromised model. In this sense, an attacker can control when a backdoored model would give unexpected/incorrect. Another important consideration is the stealth level of backdoor attacks. A model compromised with a backdoor would continue to behave as expected for normal inputs. This means a user/victim using a backdoored model would not know that they are using a compromised model till the models unexpected behavior manifests, if at all.

Overview

ONNX is a versatile format for machine learning models. It saves model as a graph where nodes are predefined ONNX operators that perform operations on input data to the ML model. ONNX is ML format agnostic - ML models built using some of the most commonly used libraries can be converted to ONNX format for ease of use and standardization. ONNX models can be exploited with Architectural Backdoors

If a model reportedly has this issue, it means that:

  1. The model format is detected to be ONNX
  2. The model contains an architectural backdoor that can compromise a ML model, i.e model will give unexpected outputs for certain inputs.

Further Reading:

  1. Architectural Backdoors in Neural Networks

How The Attack Works:

A models behaviour can have a backdoor in its architecture, specifically a parallel path in the model for how data flow through the model. For most inputs the model will behave as expected but for certain inputs , i.e. inputs with a trigger, the models’ backdoor will activate and effectively modify models behaviour.

Impact

A model with a backdoor can cause serious damage esp if it is available to clients via API. Imagine a bank that has a model that can read cheques and deposit money. If the model is somehow compromised with a backdoor, attacker can manipulate the amount added to an account.

Remediation

In the case of an architectural backdoor, visualizing a model with OSS such as Netron would also help in identification of a backdoor.

If not possible, reach out to the model creator and alert them that the model has failed our scan. You can even link to the specific page on our Insights Database to provide our most up to date findings.

The model provider should also report what they did to correct this issue as part of their release notes.

Protect AI's security scanner detects threats in model files
With Protect AI's Guardian you can scan models for threats before ML developers download them for use, and apply policies based on your risk tolerance.
Learn more