Containers and Open Source License Compliance

Editor's Note: Attorney Kate Downing (Law Offices of Kate Downing) and FOSSA engineers Zach LaVallee and Megh Suthar contributed to this blog post. You can visit Kate's website for more information on her legal services and to get in touch. And, you can connect with Zach and Megh on LinkedIn.

Containers — which are, essentially, executable, standalone packages with all files (libraries, system settings, dependencies, etc.) necessary for the software to run — are everywhere in modern application development. A recent Cloud Native Computing Foundation (CNCF) survey revealed that a staggering 84% of organizations use containers in production — up from only 23% a few years ago.

Containers have become increasingly popular for a number of reasons related to performance, security, and efficiency:

Containers can work on a number of different operating systems
They enable users to isolate certain components to fine-tune security and performance
They allow users to do a lot more with a lot less and scale efficiently

Like much of today’s technological landscape, the container ecosystem is largely fueled by open source components. However, as is the case with all things OSS, using open source components in the container environment carries certain license compliance obligations. Such is the nature of open source — creators provide innovative new technologies free of charge, with the caveat that users abide by certain requirements that govern the licensed code/files.

In this blog, we’ll explore which parts of the open source container ecosystem carry compliance obligations, how to deal with components licensed under GPL v2 (or similar strong copyleft licenses), best practices for license compliance, and more.

WATCH ON-DEMAND:
Webinar — Containers and Open Source License Compliance

Understanding the Container Environment

The container ecosystem consists of several different elements. These include:

Container images
Container recipe files (like Dockerfiles)
Container registries (like Docker Hub)
Container orchestration platforms (like Kubernetes)

Container images contain all of the software, configurations, and runtime dependencies you need for the container to run.

Containers are the run-time instances of the container image.

Container recipe files are text files that instruct how to build a container image. Within a container recipe file, you can input commands to set the base image, copy files, and configure the container, among other processes.

Container registries are where you can store and find existing container images. There are both public and private registries.

Container orchestration platforms are analogous to the conductor of an orchestra (where containers are the musicians). When you have multiple containers working together to provide a product, an orchestration platform dictates how they interact.

Container Images and Compliance Considerations

Because license compliance obligations for most open source licenses are triggered by distribution of the code, and the container image tends to be the only part of the container ecosystem that meets this criterion, we’ll focus heavily on container images for the remainder of this blog post.

As a starting point, it’s important to understand how container images are built. A container image consists of layers. The order of these layers matters since they can override each other. When you’re distributing a container image, you’re distributing all those layers — and you have to ensure compliance with open source licensing requirements for each, even those that aren’t visible during run-time.

Since the container image’s layers override each other, it is possible to introduce a software component in the preceding layer and remove it or alter the container such that it is no longer utilized in the subsequent layer. Although overridden software components may not be visible to the end user, they are nonetheless included in the container image distribution and have compliance obligations.

The base image is the first layer of the image. Typically, the base image is an official Docker image, such as CentOS, but it can be built from scratch as well. Operating system layers are, in simple terms, layers with packages typically sourced by the os-level package manager, such as apt or rpm.

Application layers are any layers in which an application of interest is sourced. These are the layers in which you have introduced your application or its dependencies if you are containerizing your own product.

It’s probably rare that you’ll encounter a scenario where you'll actually be distributing container recipe files, which is why we won’t be discussing them much in this blog. But if you do: keep in mind that recipe files are sometimes covered by different licenses than the images that are built from them. For example, a recipe file might be licensed under, say, MIT, but it may describe applications licensed under Apache 2.0 and ISC, running on top of Linux, which is licensed under GPL v2.

Container Distribution

Open source licensing requirements generally take effect upon distribution of the software. In other words, there’s no need to worry about source code disclosure if you use a GPL-licensed component to build a for-internal-use-only development tool. In the context of containers, distribution becomes relevant when you are allowing others to download your container images. For example, any images on Docker Hub are being distributed and therefore must comply with their respective license requirements.

Distribution of a fully built container image is quite similar to distribution of any other piece of software — the party doing the distributing must comply with licensing requirements of all the packages and software within the container image.

Operating System Layers vs. Application Layers

Operating system and application layers are considered separate and are treated differently for compliance purposes. This separation flows from the concept that there is such a thing as user space and kernel space. The idea is that you should in principle be able to write an application for any kind of operating system, and the way you license your application should not depend on the operating system. That's the universal understanding of how applications and operating systems should interact, provided that the operating system is being communicated with via the usual manner and isn’t being modified by the application provider. Otherwise, we'd have an entire ecosystem that only works on one operating system, which wouldn't be very useful.

From a legal perspective, there is a general consensus that applications are not derivative works of operating systems and vice versa. That's why we separate them. That separation takes several forms: physical separation between layers, separation for purposes of attribution, and separation in terms of compliance policies.

Strong copyleft licenses are regularly seen in operating system layers because of the ubiquity of Linux, which is under GPL v2. Per the above, any application can run on Linux and retain license flexibility. But, strong copyleft licenses in an application layer would likely prevent the application provider from commercially licensing the application (i.e. not under an open source license), thus strong copyleft licenses are rarely seen in commercially-licensed applications. When they are, they justifiably raise concerns from potential customers and partners about whether the application provider is actually complying with such licenses.

Thus, compliance policies for application layers should be similar to policies that you’d have for any distributed application. This may include a default-deny posture for strong copyleft licenses, perhaps a “yellow” flag for weak copyleft licenses like the Mozilla Public License 2.0, and approval for permissive licenses. However, policies for the operating system generally need to allow for strong copyleft licenses.

Since sophisticated customers and other supply chain participants expect different policies to apply to the different layers of a container, it’s best practice to divide open source attribution notices into separate sections, one for the application layer and one for the rest of the container components. This way, customers aren’t confused about whether the application developed by your company contains strong copyleft or not.

Challenges with Container License Compliance

There are several open source license compliance challenges specific to the container environment. Many of these relate to the fact that containerization is still relatively new, so there’s not much in the way of case law or established compliance best practices. Similarly, there's been confusion about how the typical open source license compliance framework applies to containers.

Another big reason why container compliance is difficult is that you are responsible for compliance with each layer — even those obscured during run-time. And, you may have a lot of open source components in the container that get distributed but aren’t necessarily used.

Many legacy compliance tools and processes weren’t designed for the container ecosystem. Scanning tools generally work by analyzing lockfiles or manifest files from package managers such as npm to create an inventory of components and licenses. The rough equivalent in the container environment is the recipe file since it includes similar data about software composition. But there are key differences that make recipe files unreliable as a source of truth for scanning tools.

Specifically, it is difficult to accurately ascertain the list of dependencies, their precise versions, and their origins solely from container recipe files. This is the case because:

Container recipe files may copy dependencies of the local filesystem, which cannot be scanned or analyzed solely from the recipe file. (The most common way of building container images is to retrieve dependencies, create the application, and use the copy command in the recipe file to copy the application from the local filesystem to the container image.)
Container recipe files typically do not signify the version of the packages.
Container recipe files typically do not signify the source of the packages. Even if it specifies the package's source (such as a publicly available source archive), it may have changed from when the container image was built.
Container recipe files may use other container images as a base layer whose recipe file is not distributed. As such, any dependencies introduced from the base layer are not visible.
Container recipe files can be written to fetch whole images with many open source components, and those images may or may not be well documented.

How to Comply with Container Licensing Obligations

Open source license compliance — in the container environment and beyond — starts with a comprehensive understanding of the licenses involved. But, as we mentioned earlier, it can be tricky to uncover licenses for each layer of a container image because these layers can be obscured.

As a starting point, compliance is easier to manage when you're creating your own container image rather than pulling one that's ready-made. If you go this route, it’s best to start with a base image that is trusted and well-documented, such as Red Hat UBI. (There are container image security benefits to using trusted and well-documented images as well.) In contrast, Docker Hub does include its fair share of images without strong documentation (or vetting).

Unless the distributors of the base image do a good job of documenting the open source included and make the source code easily available, it may be either impossible or extremely tedious to find this information on your own. The less work the distributor does upfront, the more work you will need to do to prepare for your own redistribution of the software.

Also, it can be helpful to use the same base image for all your containers. The scope of open source license compliance (and the time required to manage it) can become massive if you use different ones for different containers. It’s also much harder to secure containers without standardizing one base image.

Then, pay close attention to the timing of your scans. Specifically, make sure to scan after the container image is built — and after you’ve included all of your application dependencies — but before it’s deployed. Remember that an image can change depending on when you're building it.

Other Container License Compliance Strategies and Considerations

Each container is by definition considered a separate program, so there isn’t a viral licensing effect between containers. For example, an individual container could be licensed under AGPL, but you could still license the larger work under commercial terms (provided of course that you comply with the AGPL with respect to the container licensed under the AGPL).
Oftentimes, the easiest way to comply with licensing requirements is to ship a buddy container that has source code for the other containers. This approach is particularly convenient because when you ship the source code to all the open source you're using, you won't have to create a separate attribution report since the source code includes all the licensing information.
Container scanning tools generally perform better when package managers are used to install dependencies. Inspecting the recipe file will enable you to determine whether a package manager has been used.
Using a tool like FOSSA — which offers container image license scanning and management — goes a long way toward helping organizations comply with container compliance requirements.

FOSSA works by:

Identifying what container image artifact was distributed, extracting and identifying what’s in each layer, and scanning each layer looking for application- and OS-level dependencies.
Providing an audit-grade inventory of open source license types, even in image layers that are hidden during run-time.
Surfacing detailed metadata, including license text, copyright info, and compliance obligations.
Applying built-in, customizable OSS policies across company, product, and team. This includes the ability to flag or block policy violations natively via existing engineering workflows.

FOSSA comes with a large breadth of language support for application dependencies. It also supports CLI scanning for application dependencies, which addresses compliance issues certain tools don’t. The output of scans includes data on the relationship between dependencies, even in a container image, and which file is responsible for that dependency.

You can learn more about how FOSSA can help you manage license compliance in the container environment by getting in touch with our team today.

Note: We on the FOSSA Editorial Team are not lawyers, so if you are seeking legal advice, we suggest you speak directly to a lawyer that specializes in open source licensing.