If you use a private package index for a Python library in development, ensure that it has a blank, dummy release on PyPI before assuming your codebase builds are secure.
The PyTorch community woke up to an unpleasant surprise on Christmas Day, their nightly builds were the victim of a Dependency Confusion Attack. This attack siphoned security keys, DNS records, and other sensitive information from any Linux systems using the nightly development builds of PyTorch. The attack impacted the PyTorch nightly release branch from December 25th until December 30th, 2022. To determine if you were impacted, you can follow this thorough guidance here: https://pytorch.org/blog/compromised-nightly-dependency/.
torchtriton
, this package was not available to the public via PyPI.torchtriton
but added a malicious binary to the package’s runtime so that it would execute on any import or use of the library.torchtriton
via --extra--index--url
that pointed to the internal version would actually grab the public and modified release due to the version number being higher. (Note: PIP will always favor the higher version from a public repository.)~/.ssh
directory and many other sources to glean sensitive information before delivering it to a secure endpoint outside of your network where the researcher could review all the data that was collected.Once the PyTorch team was aware of this, they went into action to regain control of the public PyPI package entry, removing the compromised releases, shipping a dummy package, preventing any attacks of this nature in the future from the torchtriton dependency.
If your projects use open-source technologies, you probably rely on dozens of third-party packages and a host of your own to build your project. A Dependency Confusion Attack is when one of your private components is not installed, but instead a version that is public from some other source is installed in its place.
This can happen in Python if there exists a public version of a package with the same name that has a higher version number than your internal releases.
In Python packages and libraries, there exists a file __init__.py
for any directory that is a part of the package. When you import that package, any code inside __init__.py
is executed automatically. Additionally, Python packages can ship with binary executable files alongside the standard Python code.
Putting it together:
torchtriton
package, it crawled for lots of sensitive information and then shipped it out to a third party.Outlining the process for Python, however, other languages have a similar process and you can dig deeper on their pages for more relevant guidance.
With Python you primarily install packages via the PIP command, which is typically expressed like this:
pip install my-cool-package
This command will go to the official PyPI repository of packages, look for the package, and give you the latest version (i.e., the highest version number).
However, you may have a package that you have not published to PyPI yet, but which you did configure a different package index to deliver your builds to parties who need them. That can often look like this:
pip install my-cool-package --extra-index-url <https://my.private.package-index.com/artifacts/>
When this command is executed, PIP actually looks to see if there’s a release in both PyPI’s official repositories and specified index. If a higher version number exists in PyPI’s index, you receive the version from the higher enumerated choice. The only way to avoid an external source leaking in would be to pin a very specific version number, documented below. Note that this does not work for development branches, where versions are often in flux.
In this case, the attacker just needs to create a release of my-cool-package
on PyPI and set the version number to something higher than you are using. From then on, all your users are going to get the attacker’s version of the software when they run their install commands.
ML development teams would likely already know if they are susceptible to the attack on torchtriton
, if they are building off of the PyTorch nightly release. However, there may be other dependencies in a code base that create new vulnerabilities and means for exploits. If you are vulnerable to a Dependency Confusion Attack you are not alone. Checking your exposure and remediating risks is pretty straightforward and simple to resolve.
To check if your code is vulnerable:
Review your requirements.txt
or project.toml
files to see if your requirements list any packages that are installed via the flag --extra-index-url
.
Alternatively, if you are not going to share this repository publicly, you could simply block traffic to the official PyPI from most hosts on your network and maintain your own PyPI mirror.
Attacks of this nature can also be mitigated by pinning your installs to a specific version of a package, by executing a command similar to this:
pip install my-cool-package==0.1.0
Dependency Confusion attacks are incredibly easy to craft, so be sure to check your code for this issue and examine your own open-source dependencies. As you review, reach out to the maintainers to help patch the issues quickly for all users. This can make you more secure and can be a quick way to start a helpful relationship with the maintainers.