PAI-favicon-120423 MLSecOps-favicon icon3

Unveiling AI/ML Supply Chain Attacks: Name Squatting Organizations on Hugging Face

Public repositories for artifacts and libraries are vulnerable to malicious users registering for names similar to, or exactly matching, known entities in order to commandeer the goodwill associated with them. This tactic, known as “namesquatting”, or “typosquatting”, has been used to deliver malicious payloads on repositories such as Python Package Index (PyPI) and Node Package Manager (npm)

Artificial Intelligence and Machine learning (AI/ML) supply chain attacks are unsurprisingly being targeted by namesquatted organizations, within public ML repositories such as Hugging Face (HF). Nefarious users can upload a compromised model and make it publicly available, disguised as a published artifact from a reputable source. 

While Hugging Face does have a process for organizations to get verified, it’s trivial to overlook the missing badge on the web interface, and when programmatically pulling a model from the repo there’s no way to indicate that it comes from a verified organization. 

Protect AI’s threat research team has found several repositories on Hugging Face that are namesquatting well-known organizations such as Facebook, Visa, SpaceX, Hyatt, Ericsson, T- Mobile, and 23andMe. We uncovered these namesquatted repositories during our on-going evaluation of over 500,000 ML models on HF using Protect AI’s Guardian model scanner. Several of these namesquatted companies were found hosting models that carried executable and malicious code.

Below we will look into two Hugging Face repositories that are utilizing namesquatted organizations: 

alshaya/mobilenet: API Key Exfiltration

The repository alshaya/mobilenet hosts a mobilenet.pth file that is purporting to be created and shared by the multinational Alshaya Group retail organization. When we decompile and further examine the model, we can see the exfiltration payload that would be executed if a user were to load this model on their device.

 

Code Block 1. Fickling trace shows the malicious code embedded in mobilenet.pth

The payload scoops a user’s environment variables and uploads them to a command and control server.  This could include, but is not limited to, API keys, confidential path information, usernames etc.

In Fig 2, we show the results of Protect AI’s open source ModelScan evaluation of the mobilenet.pth model file.  An unsafe “exec” operator is found to be injected in the file. Worryingly,  the Hugging Face security scan has detected no such issue and no advisory for unsafe pickle imports is noted. 

Fig 2. ModelScan evaluation of  mobilenet.pth identifies unsafe operator exec.

Fig 3.  Hugging Face (HF) has no advisory on the pickle injected exec operator.

facebook-llama/custom-code

The repository facebook-llama/custom_code , is mimicking the official meta-llama repository and exploits the trust_remote_code parameter to ship malicious code to user machines. The repo custom_code contains a  configuration_glm.py, modelling_glm.py and tokenization_glm.py  all of which make requests to a webhook.site upon usage of the model:

 

Since the repository is pretending to be a reputable foundation model provider (Meta), it’s easy to see how unsuspecting users will trust the custom code. As a consequence, when loading the model with trust_remote_code = True, unintended code would execute and a request to the webhook.site would be sent.

Other repos under the facebook-llama organization utilize other vectors to run malicious code similar to the Alshaya example.

Conclusion and Recommendations

Artificial Intelligence and Machine Learning (AI/ML) is becoming more accessible and adopted across industries due to Foundational Models being available on public repositories.  At Protect AI, we view public AI/ML development as instrumental in fostering safe and secure AI/ML. 

However, the AI/ML supply chain is equally as vulnerable to the same types of attacks we’ve seen elsewhere in the open source ecosystem. Freely sharing files on model repositories, combined with name squatting to influence unsuspecting users, facilitates attackers to disseminate malicious code effectively. 

While it is important for users to be vigilant when using resources from public sources, enterprise organizations must also have controls in place. Just because a repository is public doesn't make it safe. Protect AI’s Guardian can help your organization screen for verified organizations, as well as scan models before they enter your perimeter. Contact us to learn more about how we can help you deploy AI securely.