Protect AI | Blog

Secure Your Python Projects with Dummies

Written by Chris King | Jun 6, 2023 7:00:00 AM

TL;DR

If you use a private package index for a Python library in development, ensure that it has a blank, dummy release on PyPI before assuming your codebase builds are secure.

Intro

The PyTorch community woke up to an unpleasant surprise on Christmas Day, their nightly builds were the victim of a Dependency Confusion Attack. This attack siphoned security keys, DNS records, and other sensitive information from any Linux systems using the nightly development builds of PyTorch. The attack impacted the PyTorch nightly release branch from December 25th until December 30th, 2022. To determine if you were impacted, you can follow this thorough guidance here: https://pytorch.org/blog/compromised-nightly-dependency/.

Background

  1. PyTorch’s nightly build includes a package called torchtriton, this package was not available to the public via PyPI.
  2. A self-reported security researcher noted this dependency and created a package that cloned the functionality of torchtriton but added a malicious binary to the package’s runtime so that it would execute on any import or use of the library.
  3. This researcher then published a version of the package to PyPI, going further to fake the popularity of this repository from another GitHub repository to add credibility. This release was also created to be a version number higher than the internal dependency of PyTorch.
  4. From there, any builds that installed torchtriton via --extra--index--url that pointed to the internal version would actually grab the public and modified release due to the version number being higher. (Note: PIP will always favor the higher version from a public repository.)
  5. If you then imported this version, it scraped your local ~/.ssh directory and many other sources to glean sensitive information before delivering it to a secure endpoint outside of your network where the researcher could review all the data that was collected.

Once the PyTorch team was aware of this, they went into action to regain control of the public PyPI package entry, removing the compromised releases, shipping a dummy package, preventing any attacks of this nature in the future from the torchtriton dependency.

What Is A Dependency Confusion Attack?

If your projects use open-source technologies, you probably rely on dozens of third-party packages and a host of your own to build your project. A Dependency Confusion Attack is when one of your private components is not installed, but instead a version that is public from some other source is installed in its place. 

This can happen in Python if there exists a public version of a package with the same name that has a higher version number than your internal releases.

How Severe Are Dependency Confusion Attacks?

In Python packages and libraries, there exists a file __init__.py for any directory that is a part of the package. When you import that package, any code inside __init__.py is executed automatically. Additionally, Python packages can ship with binary executable files alongside the standard Python code.

Putting it together:

  1. A malicious package can force your computers to execute code they have included without your knowledge.
  2. Static code analysis tools are often unable to detect the behavior of compiled binaries. In this case of, the torchtriton package, it crawled for lots of sensitive information and then shipped it out to a third party.
  3. The modified libraries can even be flexible enough to attack multiple architectures and operating systems.

How Does A Dependency Confusion Attack Work?

Outlining the process for Python, however, other languages have a similar process and you can dig deeper on their pages for more relevant guidance.

With Python you primarily install packages via the PIP command, which is typically expressed like this:

pip install my-cool-package

This command will go to the official PyPI repository of packages, look for the package, and give you the latest version (i.e., the highest version number).

However, you may have a package that you have not published to PyPI yet, but which you did configure a different package index to deliver your builds to parties who need them. That can often look like this:

pip install my-cool-package --extra-index-url <https://my.private.package-index.com/artifacts/>

When this command is executed, PIP actually looks to see if there’s a release in both PyPI’s official repositories and specified index. If a higher version number exists in PyPI’s index, you receive the version from the higher enumerated choice. The only way to avoid an external source leaking in would be to pin a very specific version number, documented below. Note that this does not work for development branches, where versions are often in flux.

In this case, the attacker just needs to create a release of my-cool-package on PyPI and set the version number to something higher than you are using. From then on, all your users are going to get the attacker’s version of the software when they run their install commands.

Checking for Vulnerability and Fixing It

ML development teams would likely already know if they are susceptible to the attack on torchtriton, if they are building off of the PyTorch nightly release. However, there may be other dependencies in a code base that create new vulnerabilities and means for exploits.  If you are vulnerable to a Dependency Confusion Attack you are not alone. Checking your exposure and remediating risks is pretty straightforward and simple to resolve.

To check if your code is vulnerable:

  1. Review your requirements.txt or project.toml files to see if your requirements list any packages that are installed via the flag --extra-index-url.

  2. For any packages configured with the flag, check to see if a dummy package already exists in PyPI. If no dummy package exists, create and publish one with a low out of date and unused version number to prevent anyone from exploiting this flaw in your code.

Alternatively, if you are not going to share this repository publicly, you could simply block traffic to the official PyPI from most hosts on your network and maintain your own PyPI mirror.

Attacks of this nature can also be mitigated by pinning your installs to a specific version of a package, by executing a command similar to this:

pip install my-cool-package==0.1.0

Dependency Confusion attacks are incredibly easy to craft, so be sure to check your code for this issue and examine your own open-source dependencies. As you review, reach out to the maintainers to help patch the issues quickly for all users. This can make you more secure and can be a quick way to start a helpful relationship with the maintainers.