Pathologies of Go Package Management

TL;DR:

Go package idioms make reproducible builds really easy. (Jump to section)
Go package idioms also make dependency analysis really hard. (Jump to section)
Go builds have a unique failure mode: they can be reproducible without having semantic dependency information fully specified. (Jump to section)

Go is one of my favorite languages. It gets a lot of things right: great
tooling, a pragmatic language design, and a sane module system.

Go also has a couple of key faults, but that's a blog post for another time.

I've found that newcomers to Go tend to get really confused by the module
system. It's both less flexible and less rigid than other popular languages:

Newcomers from languages that have canonical package managers (e.g. Node.JS, Ruby, Rust) are often confused by the lack of a canonical package management tool and centralized package registry.
Newcomers from languages whose module systems are Wild West free-for-alls
(e.g. C, Java, sort of PHP) are often frustrated by the rigid import path
structure, lack of parent packages, and restrictions on cyclical imports.

Fortunately, the module system (Go calls these "packages", and I'll use this
terminology for the rest of the post) and the idioms that have evolved around it are extremely simple.

How Go packages work

Any folder that contains a Go source file is a Go package. Its name is the path of the folder relative to $GOPATH/src. For example, a folder containing Go source files located at $GOPATH/src/github.com/alice/foo is named
github.com/alice/foo.

GOPATH is generally a single folder (but can be a list of folders). When a
user compiles a Go package that imports a package P, the compiler attempts to resolve P by checking F/src/P for each folder F in GOPATH and then
GOROOT.

This has a couple properties:

Wow, wasn't that simple? This algorithm is extremely easy to explain and
understand.
You cannot have two different packages with the same name. In particular,
this implies that you cannot have two different versions of the same
package.
Every package has only a single version, so you can think of the entire Go
workspace as having a single "version". Version information for each
individual package is not reliably stored (at best, you have a revision hash or named reference from the VCS repository of the package).

When you're working on multiple projects that may depend on different versions
of the same package, Property 2 turns out to be pretty annoying.

In Go 1.5, the Go team added support for vendoring to resolve this. Now, when a source file S imports a package P, the resolution algorithm is:

Does there exist a folder named vendor in any ancestor folder A of S?
If so, use A/vendor/P if P is in the folder, otherwise keep going
upwards. If you're at the root of the filesystem and haven't found P, then go to step 2.
Do the old thing (look up in GOPATH and then GOROOT).

Again, this resolution algorithm is both simple and robust. It's easy to create tools that support this workflow, and easy to understand how the compiler is resolving a package. It's also idiomatic to commit the vendor folder into version control, which makes builds extremely robust:

Developers without the original tools for managing dependencies can still
produce a working build.
There's no possibility for dependency versions or sources to be different
among different builds, since source files committed to version control.
There are no centralized package registries that must be available during the build, so outages don't break builds (looking at you, NPM).

Before vendoring, different tools would use different kinds of hacks (messing
with your GOPATH or GOROOT, rewriting import paths, etc.) to provide
project-level isolation of dependencies. With vendoring, there is One Obvious
Way.

Easy builds does not guarantee easy dependency analysis

Unfortunately, vendoring support did not come with versioning support. Doubly
unfortunately, my day job involves understanding what
versions of a dependency went into your build via the
FOSSA CLI. Allow me to give you a peek
into the rabbit hole that is Go dependency analysis.

Using a sane dependency management tool (ideally `dep`)

A good dependency manager (like dep) will do a couple of things:

It solves version constraints of the entire transitive graph, correctly
reporting when builds are not possible due to the diamond problem.
It recursively flattens the vendor folders of dependencies.
It provides easily parsed command output or (even better) an easily parsed
lockfile.

This is the best-case scenario. Analysis of these projects works basically the
way you'd imagine:

Ask go list -json -f '{{ .Deps }}' for the transitive package graph of a
target package.
For each package, look up the version of the package used by asking the build tool or reading its lockfile. This is easy: each package has exactly one version, and they're all known by the build tool.

Using a tool that allows nested vendor folders (please do not)

Nested vendor folders open a big can of worms:

You may bring up multiple copies or versions of a single package due to the
diamond problem. If any of these packages expects to be a singleton, it will
likely break in ways that are difficult to debug.
Bringing in multiple instances of a package causes
compatibility problems for consuming packages.

Dependencies with nested vendor folders may also use a different tool to vendor their dependencies. This means that several tools may all be specifying different versions of a package.

For example, a project may have a direct dependency foo using tool A to
specify transitive dependency bar with version v1. Elsewhere in its
transitive dependency graph, it may have a direct dependency on bar using tool B to specify version v2.

To analyze these cases, we have to take the location of an importing file into
account for projects with nested vendor folders. We have several options for
this:

--option allow-nested-vendor:true
enables resolution logic for using lockfiles in nested vendor folders.
--option allow-deep-vendor:true
enables resolution using lockfiles that are above the root of the nested vendor
folder. This supports projects where packages in nested vendor folders may
have their versions specified in the top-level lockfile.

Code is written by humans and intent is impossible to infer

Sometimes, version information is intentionally missing for imported packages, because the author of the code doesn't consider the imported package to be external to the project.

When we analyze a Go package, the target package is usually provided as a Go
import path (e.g. github.com/alice/foo/cmd/foobar). From this import path, we try to infer the project of the package by finding the root of the VCS
repository that the package is contained in (e.g. github.com/alice/foo). For
any imports that are within the project, we don't try to look up version
information because these imports are generally considered internal (e.g. if
github.com/foo/cmd/foobar tried to import github.com/foo/lib/quux, we would not expect the version of github.com/foo/lib/quux to be specified by the build tool since quux is part of the same project and would be versioned with
foobar).

For some projects, this assumption is not true. For example, consider the case
where Alice is creating two projects A and B that she intends to be released together. These projects are stored in separate VCS repositories due to external factors (e.g. maybe one project is open source and the other isn't), but her intent is for them to be a single unit.

In this case, knowing the project of a package within A is not enough to know that the intent is for the project to be versioned with B. This human intent is impossible to infer automatically.

Our analyzer provides several options for handling this:

--option allow-unresolved-prefix:IMPORT_PATH
informs the analyzer to not look up the version for packages with a certain
import path prefix. If this flag is not specified, a missing version during
an analysis is considered an error due to an underspecified build.
--option allow-external-vendor:true
informs the analyzer to look up dependency versions in lockfiles of other
projects. This is useful for looking up the versions of dependencies of
projects that are versioned together.

The same code can be built in different ways

Go supports build constraints,
which can include or exclude files based on the target OS or architecture. This can alter large portions of a project's transitive graph, since package imports occur on a per-file basis.

Also, go list exits non-zero when a package's source files are all excluded by a build constraint. This was a fun surprise to learn about and write special-case error handling code for.

Support for multiple build tags is still in progress.

Go projects can have C dependencies

There is basically no good, general way to identify C dependencies. At this
point, it becomes a search and indexing problem rather than a lookup problem:

How many different signals can we aggregate to identify a C dependency?
Can we identify packages from a system-level package manager?
Can we identify C source files from names or hashes?
Can we inspect the compiler or linker's runtime behavior?

For special cases, there are ways to handle this. If a project is always built
in a special Docker container with its C dependencies, it may be possible to
read dependencies from a system-level package manager. If there is a set of
vendored C dependencies, it may be possible to identify them from their sources.
Support for this is in progress as well.

In general, this is still an open question.

A reproducible build is not necessarily a fully specified build.

All of the above cases are examples of when detecting the version of a
dependency gets very tricky. Analyzing these projects is hard but doable.

In some cases, analysis is just plain impossible. Some projects are buildable
because their sources are fully committed into version control, but don't
include any kind of lockfile or version data.

These builds are reproducible but not fully specified. This is a
subtle failure mode unique to Go.

Our analyzer supports these cases too with --option allow-unresolved:true, which treats version resolution errors as warnings instead.

The only way to get dependency information for revisions in the absence of
lockfiles is to index Go packages by hash and try to look up known hashes of
known versions. go-resolve is a side
project of mine trying to address exactly this.

Fin.

This is but a small portion of the madness that consumes the weekdays of my life.
Despite these issues, Go is still one of my favorite ecosystems to work
with. (It could be a lot worse: Ruby's package manifest is literally a Ruby
file that you eval.)

Have I mentioned that FOSSA is hiring?
We're building infrastructure to make open source more accessible to everyone.
If doing impactful work, tackling deeply technical challenges, and working with a great team to build a sustainable business sounds appealing to you,
please let us know or contact me directly.