Understanding the PURL Specification (Package URL)

The Package URL (PURL) specification is an open standard for uniquely identifying software packages across different ecosystems. It was created in 2017 by Philippe Ombredanne, an open source tooling maintainer who needed a better way to standardize references to software packages.

A PURL is a specially formatted URL that describes a software package's location or identity in a package registry. It encodes the package type (e.g. npm, Maven, PyPI), the name (and optional group or namespace), version, and other qualifiers in a single string.

PURL's purpose is to provide a simple, universal identifier for software components, making it easy to track and share what components are in your software. Although PURL has several important use cases, the combination of its universal properties and machine-readability in particular have made it an essential part of enabling the modern SBOM landscape.

In this blog, we'll explain in more depth how PURL works (including differences in different ecosystems), how it’s used in SBOMs, how it compares to CPE and other component identifiers, and more.

How PURL Works

A Package URL is structured similarly to a web URL, with multiple components separated by specific symbols. The general syntax is:

pkg:<type>/
    <namespace>/
    <name>@<version>
    ?<qualifiers>#<subpath>

Each part of this string has a specific meaning, which we'll outline below; you can also reference the PURL Spec’s GitHub README and PURL Types page for additional context.

Required Fields

scheme: The scheme is always pkg, indicating a package URL (just like http or https in web URLs). This constant prefix helps tools recognize a PURL immediately.
type: The package type or ecosystem, such as npm, maven, pypi, nuget, gem (RubyGems), etc.
name: The name of the package (artifact or module name).

Optional Fields

namespace: An optional namespace or group for the package, which is specific to the ecosystem. For example, in Maven, this would be the groupId (org.apache.logging.log4j), in Docker it might be an image owner, or in GitHub it could be an organization/user. Not all ecosystems have namespaces.
version: The version of the package (if applicable). Not all PURLs include a version (e.g. if you want to refer to a package in general), but typically an SBOM or vulnerability reference will include a specific version here.
qualifiers: Additional key-value pairs to further qualify the package, prefixed by ? and separated by &. Qualifiers are optional and depend on the package type.
subpath: An optional subpath within the package, appended after a #. This can point to a specific file or directory inside the package. It's used when you need to reference a particular piece of a package's content.

It's important to note that while the above fields are optional when determining whether a PURL is valid, in practice, they play very important roles in making it possible to uniquely identify a package. Specifically, namespace and version are important for all ecosystems, and qualifier and subpath are important in particular for Linux ecosystems.

To help bring everything together, let's consider an example PURL for a Maven Central artifact:

pkg:maven/
    org.apache.logging.log4j/
    log4j-core@2.14.1

This identifies the Apache Log4j Core library version 2.14.1 from the Maven ecosystem (groupId org.apache.logging.log4j, artifactId log4j-core).

PURL Ecosystem Coverage

PURL supports a broad array of programming language ecosystems and package managers out of the box. Common types include npm (Node.js/JavaScript packages), pypi (Python packages), maven (Java artifacts), nuget (.NET packages), gem (Ruby Gems), golang (Go modules), cargo (Rust crates), docker (container images), and system packages like deb (Debian/Ubuntu) or rpm (Fedora/RedHat).

Each type definition specifies how to interpret the namespace and qualifiers for that ecosystem. For instance, Docker PURLs use the image name as the name and tag or digest as the version (e.g. pkg:docker/nginx@1.21.0 or by digest), and may use a qualifier for the registry URL.

There's even a generic type as a catch-all for things that don't fit an existing ecosystem (for example, a proprietary or legacy component) or for ecosystems that build custom distributions, such as yocto or buildroot. We should note, however, that SBOM and software composition analysis tools vary widely in their ability to understand generic PURLs, so we do recommend you talk to your current (or prospective) vendor if this is an important feature for you.

PURL and SBOMs

The most common Package URL use case that we see today is the role it plays in enabling the SBOM (software bill of materials) ecosystem. SBOMs are essentially a list of components (libraries, frameworks, modules, etc.) that make up a software product, and PURLs play a critical role in SBOMs by acting as the unique identifiers for each component.

The NTIA's 2021 publication on mandatory SBOM minimum elements doesn't explicitly require PURL; rather, it lists PURL as one of several candidates for the mandatory software component identifier field. However, in our extensive practice supporting SBOM programs across multiple customers, it's become crystal clear that using PURLs — more so than any other type of identifier — is the best way to ensure SBOM accuracy and usability.

One big reason is that PURLs are vital to unlocking SBOM enrichment. Consider the example of an SBOM that includes open source licensing information. PURL will allow the SBOM consumer to verify that the stated licenses are actually the ones associated with a given component, not just the ones the SBOM producer decided to include. The same holds true on the vulnerability side.

In addition to enriching inaccurate or incomplete data, PURL helps SBOM consumers fill in anything that's missing entirely from an SBOM. We often see this with supplier fields as well as licensing and copyright information. (Licensing fields aren't required per the NTIA minimum elements, so enrichment comes in handy for SBOM consumers who care about the license compliance use case.)

As you might expect, both the CycloneDX and SPDX SBOM formats support PURL. For example, a CycloneDX SBOM will list each dependency with something like:

{
  "name": "lodash",
  "version": "4.17.21",
  "purl": "pkg:npm/lodash@4.17.21"
}

SPDX also supports PURLs, though in a slightly different way. In SPDX v2.2 and later, Package URLs can be included as an External Reference of type: purl on a package entry.

The alignment of both major SBOM formats in supporting PURL underscores its importance. It allows SBOM producers and consumers to consistently identify components even when converting between CycloneDX and SPDX formats (so no component gets “lost in translation” due to naming differences).

PURL vs. CPE

Another commonly used software identifier is CPE (Common Platform Enumeration). CPE is an older standard (maintained by NIST) for identifying software products, widely used in vulnerability databases like the U.S. National Vulnerability Database (NVD). Both CPE and PURL are machine-readable naming schemes for software, but they were designed for different contexts and have distinct advantages.

Similarities Between PURL and CPE

Both PURL and CPE aim to uniquely identify a piece of software in a structured format, and both use a standardized syntax. In fact, all modern identifiers (including SWID tags and others) provide a way to include key metadata like name and version in a machine-parsable string. This means in theory, either could be used to index vulnerabilities or list components.

Additionally, CPE and PURL aren't mutually exclusive — it's possible to include both in an SBOM or database for completeness. Both identifiers also require some agreed-upon namespace of naming (CPE has a dictionary of vendor and product names; PURL relies on package ecosystems' naming conventions). And, importantly, both are recognized by standards bodies (CPE by NIST/ISO, PURL in progress via ECMA).

Differences Between CPE and PURL

CPE was designed to identify IT products and platforms, primarily for inventorying enterprise software and linking to vulnerabilities in the NVD. A CPE name also has a rigid format: e.g.

cpe:2.3:a:apache:
    log4j:1.2.17:
    *:*:*:*:*:*:*

This encodes vendor (apache), product (log4j), version (1.2.17), and other fields (like edition, language, etc.). It works well for software where there is a clear vendor-product pairing (often commercial or big software packages).

PURL, on the other hand, is package-focused, born out of the open source package world. It captures the package manager context (e.g. Maven vs npm) which CPE lacks, and it aligns with how developers obtain the software (via package repositories).

Another difference is complexity vs simplicity. CPE strings can be complex and sometimes require referencing a dictionary to get the naming right (for instance, knowing that “Apache Log4j” in CPE is vendor=apache product=log4j). There may be multiple CPE entries for what a developer thinks of as one library (due to different editions or platforms), and conversely, not all open source libraries have CPEs assigned unless a vulnerability is found and someone created them in the NVD. PURLs are more straightforward — they use the names as found in package ecosystems and don’t require a central authority to define each product.

As PURL founder Philippe Ombredanne explained, identifying a vulnerable library via PURL is as easy as reading its observable attributes (name, version) from code or manifests, whereas CPE demands extra knowledge of an external naming scheme.

Why FOSSA Recommends PURL

Given the prevalence of open source, PURL is generally better at handling the “long tail” of thousands of library dependencies. It's already used in many vulnerability intelligence sources. In fact, almost every major vulnerability database besides the NVD has started using PURLs or similar package identifiers.

Like we mentioned earlier, accurate component identification is the foundation of successful vulnerability management and SBOM initiatives, and PURL solves many of the issues that exist with CPEs — including that user-provided CPEs might be too generic or use wildcards, leading to false matches.

In fairness, we should also note PURL's one major downside: lack of commercial product support. There aren't many (if any) registered types/namespaces for commercial products in PURL. This is an area where CPE is generally a better option (and is a big reason why CPE and PURL can be viewed as complementary).

PURL: The Bottom Line

The Package URL (PURL) specification plays an important role in standardizing how developers and organizations identify and manage software components. In order to properly associate existing and new vulnerabilities or licenses, organizations need an ability to first uniquely identify packages. And, like we discussed earlier in this post, our view is that PURL is the unique identifier most suited for accurate and scalable software supply chain transparency and security initiatives.

For more information on all things software supply chain security — and to learn how your team can use FOSSA to automate SBOM and open source vulnerability management — you can check out our website.