Don’t forget to check “underneath” the iceberg

Today, every developer uses open source software (OSS) in their apps. If you’re developing modern software, you should probably be using a tool to help you track & comply with OSS licenses.

To properly discover what licenses you’re using, there’s no other way than to scan your code — this means you have to check every line of code across your deep dependencies for license information (ideally per-commit). Sounds like overkill, right? But for the past decade, code scanning has been a standard feature across every commercial tool that helps with OSS compliance. Plus if you ever go through due diligence, that’s the detail auditors will expect and use against you.

At FOSSA, full code scanning was one of the first features we built. For our customers, or any company serious about compliance, missing this feature meant a non-starter to deals.

Why must you run code scanning if you want to be compliant, and how can you do so without slowing down development?

Code scanning is how nearly all compliance issues are found

To understand why code scanning is important, you first have to consider the immense variety of ways developers share code. The most obvious method is by explicitly including an OSS library (usually by declaring a dependency in a software build or package file). However in software, it’s commonplace to casually copy files, code snippets, binaries or entire modules inline without a reliable way of reporting it.

Every time code is casually shared, it passes on a slew of unknown license and copyright responsibilities for every subsequent developer that uses or spreads the code. Today, developers have no easy way to see what’s inside the code they get. As more code is used/written/shared, legal obligations and risk cascade across the community. Even if your developers diligently avoid casual code sharing, they likely rely on code that doesn’t — and if they’re using a modern language/build system, their tools are automatically pulling in thousands of OSS libraries from casual developers.

Code scanning isn’t just the only way to cover these cases, but these cases also account for the majority of license violations.

When looking for tools to track your open source licenses, there are tons of free scripts and utilities to get a quick report — primarily by checking a single “package file” where developers describe the module and (hopefully) report the dominant license of their code. We call this package file parsing.

This data is useful, but has serious blind-spots since it accounts for only the most obvious way developers include OSS code. Even if all OSS developers properly licensed their code:

  • Package files missing or using default (automatically-assigned) license keys will list completely incorrect licenses
  • Package files only express “top-level” licenses for the publisher’s code —nothing for files, snippets, modules or license headers included inline
  • Package files do NOT include raw copyright, notices and other data needed for creating required disclosures, notices and attribution
  • and much, much more…

These limitations don’t account for just fringe occurrences, but the bulk of how license SNAFUs enter a codebase. Undesirable code that compromises an entire product rarely comes from explicitly including a bad package, but instead through deeply-nested files or embedded sub-dependencies.

That’s why relying on just package file parsing is not only unreliable, it’s dangerous. Most compliance issues don’t come from the obvious stuff, which is why commercial tools must implement code scanning (and are typically the only ones that can afford to — it’s a lot of work to build & maintain!).

Having code scanning is key, and is usually one of the first questions we’re asked when talking to someone procuring/evaluating FOSSA. But don’t just rely on this article; ask your lawyer.

Making code scanning accessible

Code scanning is necessary, but also intimidating because it adds a lot more data to manage. If your tool is doing full code scans, it’s doing an immense amount of work for you behind the scenes (on average ~1000x as much compared to package file parsing). As a developer, the last thing I want is to have to review tons of data in order to ship my product.

“Wait what? You want me to hire people to run this tool?” — A sad guy.

Modern developers need to move fast and have high standards for their tools; you can’t implement things that will get in their way. Unfortunately, most code scanning tools weren’t made to be run on a fast & continuous basis — their output is often large spreadsheets of technical data that require immense expertise & manual review. Trying to integrate this with ongoing development just isn’t worth it—it slows down engineers and require massive budget/buy-in. But somehow, you need to use that data to run an effective compliance process.

How can you get value from code scanning without creating more work for yourself and your team?

At FOSSA, we spent a significant amount of time figuring out how to make code scans compatible with a fast development workflow. On top of code scanning and package file parsing, we added a set of key features to keep compliance fast and automated:

  • Static analysis — to understand how modules are laid out & used in code
  • License inferencing — analyzing the difference between package, declared and vendorized licenses (from inline dependencies)
  • Automatic dual/multi-license handling — automated policy approvals if package authors give a choice between different licenses
  • Iterative scanning & notifications — focusing only on incremental changes to the codebase

These features help us take an immense amount of data from code scanning and only flag what’s relevant, allowing companies to go from scanning code once a quarter to dozens of times per day.

All of this is fully configurable and integrated with every workflow tools like GitHub, JIRA, Slack, code review, etc… You can customize every behavior down to the depth we scan or even the types of files we consider, or choose a set of standard settings (profiles) that correlate to your risk level. On our most limited profile, we’ll only scan files that *look* like they include license/copyright data.

With all that said, you can always turn off code scanning in FOSSA…

…but we’ve seen that code scanning can work really well with fast and complex development workflows — check out what SmartThings is doing!

Did I convince you?

The open source community is incredible, and we rely on it every day here at FOSSA. However, it’s also notoriously casual about sharing code and properly reporting/tracking licensing data. It only takes one of a thousand developers you’re using code from to lack diligence and include a license violation.

If you’re running a compliance tool, just make sure it scans code. And of course I encourage you to try FOSSA and see if it’s right for you.

Get started for free at http://fossa.io.