One of the biggest challenges of helping companies determine the correct solution for managing their open source usage is a mismatch of risk profiles, standards, and even definitions of common terms like "snippet scanning." We frequently are asked do you support snippet scanning? We believe there is not one-size fits all solution or answer to questions like this. So to answer your question - is snippet scanning right for your team, we delve into a couple of nuances to determine problem are you looking to solve. There are several things to take into consideration so if you are evaluating whether or not snippet scanning is a requirement, I wanted to take a moment highlight some key questions to ask in your evaluation.

Before we dive in too deeply - let's define snippet scanning. Our customers and partners have defined it from as broadly as a file of code to as narrow as looking for functions copied and pasted from stack overflow. For this article, let's use the definition that snippets are copied functions or lines of code.

Question 1: What does your development stack look like, and what languages are you using?

Snippet scanning was part of the first wave of Software Composition Analysis tools - before the rise of package managers. In order to use open source with C/C++ developers often copy and paste entire open source components into their code base – making it necessary for code to be matched in order to effectively identify open source components. There was no package manager, no package.json file, only self imposed systems of organization to highlight which open source components were used. However, with modern programing languages this is simply not the case. Today, with language like Ruby, JavaScript, Python, and GO, open source is not simply copied and pasted. Instead, open source is incorporated into proprietary software with simple command npm install or pip install, downloading the entire open source component, not just a snippet.

Question 2: What level of granularity do you need?

Snippet detection (or rather detection of functions, or portions of functions) will riddle your SCA scans with false positives, no matter what. For example, take a basic for loop:

for (i = startValue; i <= endValue; i++) {
    // Before the loop: i is set to startValue
    // After each iteration of the loop: i++ is executed
    // The loop continues as long as i <= endValue is true
}

Searching Github for this line of code yields over 600,000 results. Yes, it may be copied from an open source component, but more than likely, the engineering team is using best practices, not using open source without adhering to licenses.

FOSSA Snippet Scanning

In fact, it is good that your engineering team is using these best practices and standard formats. While they will trigger many, many false positives in snippet scanning, overloading your legal team, using these standards helps ramp new engineers to the team and helps streamline code reviews.

Is this the granularity you are looking for? Are you looking for files? folders? components?

Copying over an entire file in C/C++ is something your developers may do. You may want a tool that can identify open source components based on file or folder structure. This, to us at FOSSA is a different definition of snippet scanning, and is functionality that may be needed to support vendored package scanning. TLDR; Make sure you understand where you and your legal team wants to draw the line.

Question 3: What is your risk profile?

To date, there have been no major court cases involving snippets. In fact, one of the most prominent court cases involving open source leads me (not a lawyer - please do not take this as legal advice) to believe that snippets impose almost no risk. Oracle sued  Google for using Java APIs. However,  Marc Radcliffe (a premier open source counsel - do take his advice) summarized the case stating Judge Alsup issued a decision finding that the JAVA APIs were not protectable under copyright law.

Judge Alsup States:

"So long as the specific code used to implement a method is different, anyone is free under the Copyright Act to write his or her own code to carry out exactly the same function or specification of any methods used in the Java API. It does not matter that the declaration or method header lines are identical. Under the rules of Java, they must be identical to declare a method specifying the same functionality—even when the implementation is different. When there is only one way to express an idea or function, then everyone is free to do so and no one can monopolize that expression. And, while the Android method and class names could have been different from the names of their counterparts in Java and still have worked, copyright protection never extends to names or short phrases as a matter of law."

Summary

Is snippet scanning right for your team? Possibly, but in practice we find most teams find that it adds more burden to the legal and engineering teams resolving any flagged issues than it does highlight potential copyright infringement. Instead, we find the biggest supporters of snippet scanning to be the original SCA tools which were built for different era of software development entirely.