FOSSA Logo

Reproducible Builds

A set of software development practices that create an independently-verifiable path from source code to binary, ensuring that a given source code always produces identical binary output regardless of who builds it.

What are Reproducible Builds?

Reproducible builds (also known as deterministic builds) are a set of software development practices that ensure the same source code always compiles to identical binary output, regardless of who builds it, when they build it, or what environment they build it in. This property allows independent verification that a binary was indeed built from the claimed source code, without hidden modifications, malicious code, or unauthorized changes.

The goal of reproducible builds is to establish a verifiable path from source code to binary, enhancing trust in the software supply chain and enabling third-party validation of software artifacts.

Why Reproducible Builds Matter

Supply Chain Security

Reproducible builds make it significantly more difficult for attackers to inject malicious code during the build process, as any unauthorized modification would be detectable through binary comparison.

Trust Verification

Users and organizations can independently verify that a binary matches what the original developers intended, rather than trusting that build outputs haven't been tampered with.

Compliance Requirements

Regulatory frameworks increasingly require evidence that deployed software matches its source code, which reproducible builds can help demonstrate.

Build System Debugging

When build outputs differ unexpectedly, reproducible build practices make it easier to identify and fix the sources of non-determinism.

Technical Challenges to Reproducibility

Several factors can cause identical source code to produce different binaries:

Timestamps and Build Dates

Many build tools embed the current date and time into compiled artifacts:

// Timestamp often embedded in binaries
#define BUILD_TIMESTAMP __DATE__ " " __TIME__

Filesystem Ordering

Different operating systems may traverse directories in different orders, affecting how inputs are processed:

// Order of these files might differ across systems
src/
  file1.c
  file2.c
  ...
  fileN.c

Build Path Embedding

Many compilers embed absolute file paths in debug information:

// Debuginfo might contain
// "/home/user1/project/src/main.c" vs.
// "/home/user2/project/src/main.c"

Random Number Generation

Entropy sources used during builds may produce different outputs each time:

// Randomly generated identifiers
const uuid = generateUUID();

CPU-Specific Optimizations

Different processor features might trigger different compiler optimizations:

// Might use different instructions on AMD vs. Intel
#pragma omp parallel for
for (int i = 0; i < n; i++) { ... }

Environment Variables

Build scripts that read environment variables without explicit control:

# Environment-dependent build behavior
if [ -n "$DEBUG" ]; then
  CFLAGS="-g -O0"
else
  CFLAGS="-O2"
fi

Implementing Reproducible Builds

Source Control Practices

Explicit Dependencies

Specify exact versions of all build dependencies:

// package.json with pinned dependencies
"dependencies": {
  "express": "4.17.1",
  "lodash": "4.17.21"
}

Vendoring

Include third-party dependencies directly in your source repository:

vendor/
  dependency1/
  dependency2/

Lockfiles

Use lockfiles to specify exact dependency versions:

# Example yarn.lock entry
lodash@^4.17.21:
  version "4.17.21"
  resolved "https://registry.yarnpkg.com/lodash/-/lodash-4.17.21.tgz#679591c564c3bffaae8454cf0b3df370c3d6911c"
  integrity sha512-v2kDEe57lecTulaDIuNTPy3Ry4gLGJ6Z1O3vE1krgXZNrsQ+LFTGHVxVjcXPs17LhbZVGedAJv8XZ1tvj5FvSg==

Build Environment Controls

Containerization

Use container technologies to provide consistent build environments:

# Dockerfile for build environment
FROM debian:bullseye-slim
RUN apt-get update && apt-get install -y build-essential
WORKDIR /build
COPY . .
RUN make

Build Script Determinism

Design build scripts to eliminate non-deterministic elements:

# Setting timestamp to a fixed value
export SOURCE_DATE_EPOCH=1577836800  # 2020-01-01 00:00:00 UTC

Environment Variable Control

Explicitly set or strip environment variables that might affect builds:

# Clear user-specific environment variables
unset LANG LC_ALL HOME USER

Compiler and Build Tool Configuration

Stable Output Ordering

Configure tools to use stable ordering of inputs:

# Sort input files to ensure consistent order
SOURCES := $(sort $(wildcard src/*.c))

Deterministic Flags

Use compiler flags that enhance determinism:

# GCC flags for more deterministic output
gcc -ffile-prefix-map=/build/dir=. -fdebug-prefix-map=/build/dir=. -frandom-seed=42

Strip Timestamps

Remove or normalize timestamps in outputs:

# Stripping timestamps from a ZIP file
zip --no-extra

Tools for Reproducible Builds

Build Comparison Tools

  • diffoscope: In-depth comparison of files beyond a binary diff
  • reprotest: Tests a build system for reproducibility issues
  • buildinfo: Files that record the build environment

Programming Language-Specific Tools

  • Bazel: Build system with reproducibility features
  • Gitian: Script for creating deterministic builds (used by Bitcoin)
  • Nix/Guix: Package managers with reproducible build capabilities
  • Maven Reproducible Build Plugin: For Java projects

Integration and Workflow Tools

  • Reproducible Builds CI: Continuous integration setups that verify reproducibility
  • Rebuilders: Services that independently rebuild packages to verify them
  • SOURCE_DATE_EPOCH: Environment variable standard for build timestamps

Real-World Reproducible Build Initiatives

Debian Reproducible Builds

The Debian Linux distribution has been working on making packages reproducible since 2013, with over 90% of packages now building reproducibly.

Bitcoin Core

Bitcoin Core implemented reproducible builds using Gitian, allowing users to verify that binaries haven't been tampered with:

# Verify Bitcoin Core build
./contrib/gitian-build.py --verify 0.21.0

F-Droid

F-Droid, an alternative Android app store, builds all applications from source code in a reproducible environment.

Tor Browser

The Tor Browser is built reproducibly to decrease the risk of targeted malware being inserted during the build process.

Best Practices for Reproducible Builds

  1. Start Early: Design for reproducibility from the beginning rather than retrofitting
  2. Document Requirements: Clearly specify all build dependencies and environment requirements
  3. Version Everything: Keep all build tools and dependencies under version control
  4. Test Reproducibility: Regularly verify that builds are reproducible across different environments
  5. Use Containers: Isolate build environments with containers or VMs
  6. Fix Sources of Non-Determinism: Identify and eliminate timestamps, random seeds, and ordering issues
  7. Build Verification: Implement processes to regularly verify official builds against source code
  8. Public Rebuilds: Support independent rebuilding and verification by third parties

Future of Reproducible Builds

Emerging Standards

  • SLSA Framework: Levels of software supply chain security, with reproducibility as a key component
  • In-Toto: Framework to secure the integrity of software supply chains
  • Binary Transparency: Public logs of software releases for verification

Integration with Other Supply Chain Security Measures

  • SBOMs: Software Bill of Materials to document components
  • Sigstore: Platform for signing, verifying, and protecting software
  • Verifiable Artifact Registries: Repositories that track and verify build provenance