Dependency confusion is a software supply chain exploit that takes advantage of a quirk in certain package managers to inject unwanted (and potentially malicious) code.

These attacks are based on the fact that many package managers check public code registries for a package before private registries. Accordingly, if a package exists in a private registry, an attacker could register a package of the same name with the public registry. Then, when a new install occurs, the malicious version on the public registry would be pulled in.

Dependency confusion is a relatively novel type of software supply chain attack. It was revealed by developer Alex Birsan in a 2021 blog post on Medium. The initial disclosure affected Apple, PayPal, Microsoft, and Yelp, among other large companies. (This was considered such a big deal that Birsan earned over $130,000 in bug bounty payments.)

In this blog, we’ll detail several dimensions of dependency confusion, including how bad actors identify packages in private registries and strategies for preventing attacks.

Note: This piece is based in part on the recent webinar: Beyond the CVE: Addressing Novel Supply Chain Risks. If you’re interested in software supply chain security and would like more information, we’d recommend you view the on-demand version, which is linked below.

How Dependency Confusion Attacks Work

Many popular programming languages have associated package managers. These include pip (for Python), Cargo (for Rust), RubyGems (for Ruby), and npm (for Node.JS and Javascript), among many others.

Package managers automate the process of installing and updating your dependencies. And, many package managers are connected to public code registries. For example, npmjs is the public registry for the npm package manager and PyPI is the repository for the pip package manager.

Many organizations also use private registries, which enable engineering teams to easily share and re-use proprietary code. But since packages published in private repositories aren’t also pushed to public repositories, identically named packages can exist in both places.

So, what happens when an organization wants to use a package that has identically named versions on private and public repositories? Any package manager that checks a public registry either before or in addition to private registries is exposed to dependency confusion. If a package with the same name is published to the public registry, then the package manager will see it and attempt to pull it in (some nuances exist with pip and other managers, but the general idea is consistent).

Of course, this raises another question: How can bad actors find packages in private registries in the first place? Birsan documented his experience in his Medium post.

"A few full days of searching for private package names belonging to some of the targeted companies revealed that many other names could be found on GitHub, as well as on the major package hosting services — inside internal packages which had been accidentally published — and even within posts on various internet forums.
However, by far the best place to find private package names turned out to be… inside javascript files.
Apparently, it is quite common for internal package.json files, which contain the names of a javascript project’s dependencies, to become embedded into public script files during their build process, exposing internal package names.
Similarly, leaked internal paths or require() calls within these files may also contain dependency names. Apple, Yelp, and Tesla are just a few examples of companies who had internal names exposed in this way."

For more information on how bad actors identify private package names, consider viewing @DhiyaneshDK’s post on the topic.

Protecting Against Dependency Confusion Exploits

A highly effective way to defend against dependency confusion attacks is to reserve (i.e. squat) the package name or namespace on the default/public registry. This prevents configuration modifications from accidentally exposing a project to the vulnerability.

Even a system that is set up to be safe from dependency confusion today can easily have a mistake that exposes the project to the attack vector. By preventing the attack at the source (the public registry), no amount of human error risks exposing the project.

It's quite easy to do this on the registry side. However, if the namespace or package name is already reserved, then developers will need to change the package names internally, which can be a headache.

Reserving Namespaces in npm

Let’s consider the hypothetical case of CompanyP on npm. Here, we’d ideally reserve a name like @companyp and publish packages under @companyp/<packagename>. But, if companyp is already taken, we might select an alternate like @companyppro. Now, CompanyP — and only CompanyP — can publish under @companyppro.

However, if our internal code still used @companyp which pulled from an internal registry, then we'd still be exposed to whomever owns @companyp on npm publishing (assuming our configuration allowed accessing npm). In this case, we’d recommend that all internal code be updated to have packages with the namespace @companyppro, which is under our control.

Reserving Package Names in PyPI

As of today, PyPI does not have namespacing, so each private package should have a public package squat in the same name.

In all cases, the namesquatting package should NOT contain proprietary code — it should just be an empty package. Also, it can be helpful to have the package throw an error as soon as it's used; that way, if it's accidentally pulled in, it can be identified ASAP.

Editor’s Note: The suggestions in this section are not comprehensive. They examine only npm and python, for demonstration purposes. Other package managers can be impacted, even if they're not mentioned here.

FOSSA senior software engineer Matt Schwartz contributed to this article.