TikTok has been in the news, and not for good reasons. ByteDance, the Chinese parent company of TikTok, has been using the wildly popular social media video-sharing app to collect locations, browsing behavior, and search histories from most of its hundreds of millions of users in the U.S., Canada, Australia, and Europe.

The concern has become an issue of national security. President Trump has issued multiple executive orders to ByteDance and Tencent Holdings, the Chinese parent company of the  social messaging app WeChat, to divest interest in U.S. operations within 90 days. And Microsoft declared its intent (alongside interest expressed by both Oracle and Twitter) to acquire and fully patriate TikTok, its data, and its algorithms as subsidiaries of its main U.S. entity.

There are outstanding questions about what U.S. patriation actually means in the case of TikTok, since most of its business operations, executive staff, revenues, and users (notably, TikTok is not available in China) are already American but I’ll reserve speculation for now. In any case, the surveillance implications of TikTok’s current setup are concerning, since most of its runaway success originates in ByteDance’s algorithms driving recommendations, engagement, and ad impressions that are reliant on data enriched across multiple Chinese sources and, therefore, potentially available to the central government of the People’s Republic of China.

But, beyond the known-knowns of U.S. user data flowing into Chinese proprietary apps and devices (like the ones made by Huawei and ZTE), there are also the known-unknowns of foreign-origin code used in almost all software developed in the West. What could President Trump’s determination that Chinese-owned software is a security threat mean for other software and the businesses and users who rely on it? In particular, there is a series of questions these events raise about open source software and the open source community as a whole. Do you know where your open source software comes from? Should it matter?

Open Source and the Global Community

The concept of free and open-source software (OSS) has been around almost as long as software itself. Early pieces of software were written by academics who collaborated to create it and naturally shared source code.

As time went on, companies started to see source code as proprietary a source of revenue. Then came the restrictive license agreements, monolithic software development, and source code locked down like Fort Knox.

The introduction of Linux rekindled the dying OSS flame. 1998 brought the Open Source Initiative, and now OSS is a thriving community of professional and hobbyist engineers working to make software better. GitHub alone boasts more than 50 million developers worldwide and 100 million repositories, most containing open source code. According to Gartner’s 2019 Software Composition Analysis Report, up to 90% of every piece of software is from open source, and 90% of companies building any type of software include OSS in their code.

But that’s just the thing. OSS has given rise to a global community. Open source projects include dependencies on countless other open source contributions from potentially anyone and anywhere in the world. And many open source projects thrive on a meritocracy of completely distributed code commits. Managing the particularities of licensing and governance for all those dependencies interacting in production is itself a challenge, no less the implications of geographic origin.

Can You Know Where Code Comes From?

Companies depend on OSS to quickly release, iterate on, and update their digital products. But open source dependencies are challenging to gain visibility into, let alone fully understand what code they contain. Can you know where the OSS code you depend on every day comes from?

To do so would require tremendous effort. Code repositories such as Git offer an unparalleled history of all code changes within a project. So you’ll know who checked in what code and when they did it. But then, that’s only the history of the names developers have chosen to enter into the system. And a complete list of contributions is not always available unless the repository and all contributing repositories exist in a public forum such as GitHub (keeping in mind that GitHub also offers and monetizes private repositories).

Chinese companies like Alibaba (in addition to Tencent and Baidu, not shown) are among the top open source contributors.

The theoretical scope of any undertaking to audit the geographic origins of all OSS contributions in real time is enormous. To thoroughly track where all of the code in you application comes from, you’d have to:

  • Gain access to all contributions
  • Determine where each contribution originated
  • Read and understand every piece of code
  • Determine and enforce policies that govern geographic origin

No doubt, most companies wouldn’t dare spend the amount of money, time, and effort required to dig into 90% of their software in this manner, especially since, in almost every case, the process would be completely manual.

Ultimately, full government regulation compelling private industry to worry about this issue would be hugely disruptive and unlikely to pass. More likely, policy would take the form of a steady encroachment in the form of rules, reporting requirements, and some restrictions based on the software’s sensitivity to national security. That takes two forms: critical public infrastructure and overall ubiquity. The open source components that help run the power grid, military systems, and transportation platforms are examples of critical infra. TikTok and WeChat are examples of ubiquity. Outside of these two categories are the most likely to be targeted by additional policies: highly regulated private areas like communications networks, health systems, and financial services operations.  

But maybe a more important question than how and who must be asked: Should we know where the code comes from?

“Geo-Filtering” OSS: A Slippery Slope?

Concerns about software that comes from particular regions of the world, such as China or Russia, are rapidly increasing and gaining visibility. Companies, customers, and governments all must take security and privacy threats seriously.

The idea of blocking OSS that originates from certain countries could, in theory, be considered an option. All open source components could be vetted as part of the CI/CD process, contributions outside of policy could prevent the build from shipping, and those from restricted geographies would ultimately not be used in production code.

But let’s step back for a minute. If the OSS community is global, and anyone can contribute to any project, the idea of geo-filtering easily becomes the crest of a slippery slope. After all, until now, open source was actually the solution to the problem of security controls on certain foreign-origin software.

It’s easy to ban software from one country. But what if a piece of software could have code written by developers all over the world? One program could have a U.S. developer contribute alongside a developer from the U.K., Russia, China, Japan, Argentina, Egypt, Israel, and many others. That’s ultimately the entire nature of the open source ethos and, arguably, the source of benefit when it comes to open source software’s better security, extensibility, interoperability, and scalability.

More strikingly, location is arguably a false flag, since anyone who has seen a post-Cold War spy movie knows that identity and loyalty are rarely perfectly aligned to geography, nationality, or heritage.

So, when is it justifiable for governments to compel companies to dig into the demographics of every OSS contribution? Where do we decide where to draw the line? How do we understand the balance between perceived and real security threat, no less the cost and benefit of  researching every developer who has ever contributed a line of code or fix?

Would your budget collapse under the weight of cyber sleuthing? Or, like so many other elements of security and governance, would automation be the only practicable solution?

Do You Know Where Your OSS Comes From?

We’ve raised many questions here. Right now, that’s all the industry has. Questions are the first step towards finding solutions. But what do we do now?

We could ignore everything that’s happening. Just because a few Chinese companies have gained traction among consumers in the U.S., it’s as yet unclear whether federal decisions are being driven by an actual non-neutral security threat or another hardline trade policy.

On the other hand, some pundits believe nefarious foreign state data access is the greatest existential threat to the safety of the world’s richest democracies. That would arguably make foreign-origin contributions to open source software the primary surface area for non-compliance with future security policy.

While one definitive solution has proven elusive, there are some options.

We could just keep the status quo and perform the standard security reviews, enhancing the open source infrastructure we’ve chosen and trusting our practices and processes to vet it.

Complete lockdown is a second option. In an age when Chinese government access to foreign consumer data is a documented source of state-run surveillance programs, unmitigated use of open source components may be truly dangerous.

A third option is to fork open source projects and maintain them only in the U.S. and partner countries. This is a practicable short-term solution, especially for critical infrastructure. But the Great Fork will likely become difficult to maintain and fall behind current innovations.

But it’s unlikely the nationality of software will go completely unregulated in the future or that we’ll ever return to building every piece of software ourselves, even if we’d then know the code is safe. The solution is certainly somewhere in the middle. And, possibly more importantly, effective governance will almost assuredly rely on automation that helps every company scale its policy enforcement to maintain the pace of open source.

A solution is not here yet. But we can and should start thinking about the implications of open source software’s origins, the changing direction of federal policy, and our responsibility to maintain our customers’ privacy and our national security.