C++: Build Systems

Keywords

cpp, cmake, build systems, compilation, linking, translation units, odr, package managers, vcpkg, conan, headers

Introduction

The new hire cloned the repository on Monday and ran the one command that promised to build everything. It failed: the compiler could not find a header that plainly existed in the tree — an include path the project assumed but never wrote down. She fixed that, and the link step failed instead, with an undefined reference to a function whose definition sat right there in a .cpp the build had never compiled. She fixed that, and a third-party library would not link at all, because the version her package manager installed had been built against a different standard library than the rest of the project. By Wednesday she had a working binary. The codebase had not changed a line; what she had fought for two days was not the program but the build.

Then on Thursday she changed one line — a field added to a struct in a widely-included header — and watched a clean rebuild crawl for forty minutes, recompiling hundreds of files that had no logical dependence on her change. Welcome to C++ build systems, where the gap between “it compiles on my machine” and “it compiles on yours” is wide enough to lose a week in, and where a trivial edit can rebuild the whole project.

None of this is bad luck. It is the predictable consequence of how C++ turns source into a program — and of the fact that, unlike Rust with cargo or JavaScript with npm, C++ ships with no built-in build tool and no package registry at all. The standard defines how code compiles; it says nothing about orchestrating a hundred files, finding a library, or pinning a version. The ecosystem filled that vacuum with CMake to describe builds and package managers like vcpkg and Conan to fetch dependencies — and once you understand the compile-and-link model, this tooling stops being voodoo.

The Core Insight

C++ splits the journey from source to binary into two phases that know almost nothing about each other. Compilation takes each .cpp file — a translation unit — and compiles it in complete isolation into an object file: machine code plus a table of the symbols it defines and the symbols it still needs. The compiler sees one unit at a time and never looks at your other source files. Linking then takes all those object files plus any external libraries and stitches them into one binary, matching each “I need this symbol” against some “I define this symbol” elsewhere.

The two phases are bridged by the #include model — a literal, textual copy-paste of the header into the translation unit, so a header included by fifty files is compiled fifty times — and governed by the One Definition Rule (ODR): every entity has exactly one definition across the program, and every unit that uses it must see an identical one. Violate it (two units disagreeing on a type’s layout because they were compiled with different flags) and the program can link cleanly and then corrupt itself at runtime in ways no error message explains. Both the slow rebuilds and the worst bugs trace to this bridge, as the rest of the chapter shows.

The model is powerful but unmanaged: there is no standard tool to say “these ten files form a library, it depends on OpenSSL, build it.” So the ecosystem standardized on CMake, which lets you describe targets and dependencies declaratively and generates the actual build, plus package managers to supply the libraries CMake links against.

A mental model

Picture the compiler as a print shop that only ever sees one chapter of a book at a time. You hand it a chapter (a translation unit), it typesets it into finished pages (an object file), and it leaves blank cross-references wherever the chapter mentions something defined in a chapter it never saw — it just notes the reference. The linker is the bookbinder who collects every typeset chapter, resolves every cross-reference against the chapter that defines it, binds in any pre-printed appendices (external libraries), and produces one finished book (the binary). If two chapters were typeset from different editions of the same shared appendix, the bound book looks fine on the shelf and contradicts itself when you read it — an ODR violation. Headers are the shared style guide every chapter copies in before typesetting.

CMake is neither print shop nor bindery — it is the production editor writing the work order: which chapters exist, which appendices they need, in what order. CMake compiles nothing itself; it produces the instructions (build.ninja) the others follow. Figure 24.1 shows the whole pipeline: source and headers flowing into independent compilation, the linker pulling object files and libraries together, and CMake above it all, describing the graph and generating the build.

When to reach for what

For the build tool, the decision is nearly made for you: CMake is the de facto standard. Most C++ libraries ship a CMakeLists.txt and expect you to as well; IDEs, package managers, and CI all assume it. The alternatives have real strengths — Meson reads more cleanly, Bazel wins decisively for giant hermetic monorepos with remote caching — but for an ordinary cross-platform project, choosing anything else means swimming against the entire ecosystem. Choose CMake unless you have a specific, defensible reason not to.

For dependencies, the question is how many and how complex. One or two libraries you pin yourself fit CMake’s built-in FetchContent; a real graph earns a package manager (vcpkg or Conan), covered below. The rule that matters more than the choice: pick one strategy per project and commit to it. Mixing managers is how you end up with two incompatible builds of the same library in one binary.

What you’ll learn

  • How C++ turns source into a binary through independent compilation and a separate link step, and why that two-phase model shapes everything else
  • Why the textual #include model and the One Definition Rule cause long builds, slow rebuilds, and the most baffling category of C++ bug
  • How to write modern, target-based CMake — and why the old global-variable style is a trap
  • How CMake’s configure / generate / build phases differ, and why you rarely need to re-run all three
  • When to reach for FetchContent versus a package manager (vcpkg or Conan), and how each makes a build reproducible
  • How to make builds fast: incremental compilation, precompiled headers, ccache, and why C++20 modules attack the #include problem at its root
  • How to wire C++ into CI with matrix builds and sanitizers

Prerequisites

  • C++: Fundamentals — what a translation unit, a header, and the standard library are; how source becomes an object file. The compilation model here is the machinery underneath that material.
  • Comfort at a shell: running commands, reading exit codes, and following compiler and linker error output.
  • A working toolchain (GCC 9+, Clang 9+, or MSVC 2019+) and CMake 3.16 or newer — the version that made precompiled headers and unity builds first-class.

The compilation model: where the time goes

Everything here is downstream of one fact. Building a C++ program runs three stages in order. The preprocessor expands every #include, #define, and #ifdef — textually, mechanically — turning each .cpp into a self-contained stream with all its headers pasted in. The compiler takes that one expanded translation unit and compiles it, alone, into an object file: machine code plus a symbol table of what it defines and what it references but does not define. Then the linker resolves those references — matching each undefined symbol to a definition in some other object file or library — and emits the binary. Figure 24.1 traces this end to end.

The crucial property is that each translation unit is compiled in total isolation. The compiler working on parser.cpp cannot see main.cpp; it knows about a function defined elsewhere only because a header declared its signature. This is what makes the build parallelize so well — a hundred units on a hundred cores — but it is also the source of the opening story’s two pathologies. The “undefined reference” was the linker failing to find a definition for a symbol it had a declaration for: the header promised a function existed, but the .cpp that defined it was never added to the build, so no object file supplied it. Every file compiled; linking failed because a promise went unfulfilled.

The slow rebuild comes from the textual nature of #include. Because a header is copied verbatim into every unit that includes it, its contents are part of the input to each of those compilations. Change the header — even by adding a field nobody references — and every unit that included it must recompile. A header included by three hundred files is, for rebuild purposes, a shared dependency of three hundred compilations. That is the forty-minute rebuild: not a slow compiler, but a header at the center of a coupling web, dragging the whole graph along whenever it twitches. The structural fix is to include less — forward declarations where only a pointer or reference is needed, and the pimpl idiom to hide a class’s private members behind an opaque pointer so its header stops changing with its implementation.

Underneath both sits the One Definition Rule, the contract that makes independent compilation safe. Each entity may be defined exactly once across the program though it may be declared in many units; and where one (a class, an inline function, a template) is defined in multiple units, every definition must be token-for-token identical. The compiler cannot check this — it sees one unit at a time — so the linker and runtime inherit the consequences when you break it. The vicious way to break it is not writing two definitions on purpose, but compiling the same header in two units with different flags, so the same struct ends up with a different memory layout in each. The program links without complaint and then reads a field at the wrong offset — exactly the failure in the war story below. ODR correctness depends on building your whole dependency graph with one consistent toolchain and one consistent set of flags.

CMake fundamentals: describe the graph, generate the build

CMake’s central idea is that you should not write build commands; you should describe what you are building and let CMake generate them. The unit of that description is the target — an executable or a library — which owns its properties: its sources, include directories, compile flags, and the targets it depends on. You declare a target, attach properties, and wire it to its dependencies; CMake works out the build graph and emits a build.ninja (or Makefile) for a backend to execute. A description for a small multi-file program is genuinely small:

cmake_minimum_required(VERSION 3.16)
project(MyApp VERSION 1.0.0 LANGUAGES CXX)

# A library target: a few sources plus the headers it publishes.
add_library(mylib src/parser.cpp src/utils.cpp)
target_include_directories(mylib PUBLIC include)   # PUBLIC = mylib AND its users see this
target_compile_features(mylib PUBLIC cxx_std_17)   # mylib requires C++17, and so do its users

# An executable that depends on the library — that one line carries the include
# dirs, flags, and standard that mylib declared PUBLIC.
add_executable(myapp src/main.cpp)
target_link_libraries(myapp PRIVATE mylib)

The single most important habit in modern CMake is hidden in those PUBLIC and PRIVATE keywords. They control propagation: a PUBLIC property is used by the target and inherited by anything that links it; a PRIVATE property is used by the target alone; an INTERFACE property is the reverse — not used to build the target, only inherited by its users (exactly what a header-only library wants, since there is nothing to compile). Get this right and dependencies flow automatically: when myapp links mylib, it picks up mylib’s public include directory and C++17 requirement because those were PUBLIC. This is why modern CMake is called target-based — and why the older style of global commands like include_directories(), which dump settings into every target in scope, is something to unlearn. Global commands create invisible, project-wide coupling: every executable silently inherits every include path, and nobody can tell from the file which target actually needs what. Attach properties to targets, never to the project.

Worth internalizing early too are CMake’s three phases, because conflating them wastes time. Configure reads your CMakeLists.txt, probes the compiler, and resolves find_package calls. Generate turns the resulting model into concrete build files — once. Build runs the backend to compile and link, and is the repeatable step you run constantly. In practice that is two commands:

cmake -B build -G Ninja          # configure + generate (do this once)
cmake --build build              # build (repeat after every code edit)

You only re-configure when CMakeLists.txt itself changes, and CMake usually notices and re-runs configure for you. Day to day you live in cmake --build build, which recompiles only the translation units whose inputs changed — looping straight back to why header coupling hurts.

Dependency management: making “it builds here” portable

CMake describes your code; it does not, on its own, fetch the libraries you depend on. For libraries already installed on the system, find_package locates them and hands you an imported target you link like any other — find_package(OpenSSL REQUIRED) gives you OpenSSL::SSL. But “already installed on the system” is exactly the assumption that breaks on a new machine — the problem the new hire spent Wednesday fighting. The fix is to stop relying on what happens to be installed and declare your dependencies as part of the project, so a fresh checkout fetches the right versions.

The lightest option ships inside CMake: FetchContent downloads a dependency (typically from a pinned git tag) and builds it as part of your configure step — no separate tool, no system-wide install. It is ideal for a handful of dependencies whose versions you control directly:

include(FetchContent)
FetchContent_Declare(
  fmt
  GIT_REPOSITORY https://github.com/fmtlib/fmt.git
  GIT_TAG 10.1.1                       # pin a tag — never a moving branch
)
FetchContent_MakeAvailable(fmt)
target_link_libraries(myapp PRIVATE fmt::fmt)

Once the graph grows past a few libraries — transitive dependencies, multiple platforms, or fast CI — a dedicated package manager pays for itself. vcpkg uses a manifest (vcpkg.json) listing the libraries you need and integrates with CMake through a single toolchain file; its binary caching means CI does not rebuild Boost from source on every run. Conan is the Python-based alternative, especially strong for cross-compilation and libraries with many build-time options. Both make resolution declarative and reproducible: the manifest names the versions, the tool fetches them, and find_package resolves against what the tool provided. A vcpkg manifest is just a list, picked up through the toolchain file:

{ "name": "myapp", "version": "1.0.0",
  "dependencies": ["fmt", "spdlog", "nlohmann-json"] }

Whichever you choose, the discipline is the same: pin versions explicitly — a git tag for FetchContent, a version in the manifest for a package manager. Two managers in one project can each supply their own build of fmt at slightly different versions, both link, and produce a runtime-corrupting ODR violation. One strategy, pinned versions, a build a teammate reproduces from a clean checkout: that is the whole point of taking dependencies seriously.

Build performance: incremental builds, caching, and modules

A C++ build is slow for two reasons with different cures. The first is redundant work across rebuilds, and the answer is incremental compilation: the build system tracks each object file’s inputs and recompiles only the units whose inputs changed — mostly automatic, but only as good as your include hygiene. The forty-minute rebuild from the opening story is incremental compilation working correctly: the header genuinely changed, so every dependent unit genuinely must recompile. The cure is fewer dependencies on that header, not a faster build system.

The second reason is repeated work the toolchain could skip, and several tools attack it. Precompiled headers (PCH) compile the heavy standard headers (<string>, <vector>) once into a form every unit reuses — a large win for projects leaning on big templated headers. ccache caches each object file keyed by the exact preprocessed input and flags, serving the cached object on a hit across rebuilds, branches, and a CI fleet; one line of config routinely turns a cold CI rebuild near-instant. Unity builds concatenate several units into one to amortize per-file overhead. And switching the backend from Make to Ninja is almost free speed — it was built for this graph and schedules parallel work better.

# In CMake: compile the heavy standard headers once, reused by every dependent TU.
#   target_precompile_headers(mylib PUBLIC <string> <vector> <memory>)
# Then cache compilations across rebuilds, branches, and CI runs, and use Ninja:
export CMAKE_CXX_COMPILER_LAUNCHER=ccache
cmake -B build -G Ninja

All of these mitigate a problem rooted in the #include model itself — recompiling the same textual headers over and over. C++20 modules attack that root directly. A module is compiled once into a binary interface that importing units consume directly, with no textual re-expansion and no macro leakage across the boundary; import std; references a pre-compiled artifact rather than pasting thousands of lines into your unit the way #include <vector> does. Modules promise to make the rebuild cascade largely disappear — change an implementation without touching the interface and dependents need not recompile — but toolchain and build-system support is still maturing, so today they are a direction, not a default. (Their mechanics belong to C++: Modern C++; the point here is that they are the structural answer to the problem PCH and ccache only paper over.)

CI for C++: matrices and sanitizers

C++ is portable in principle and platform-specific in practice, so CI leans on the matrix build: the same cmake -B build / cmake --build build / ctest sequence run across a grid of operating systems (Linux, macOS, Windows), compilers (GCC, Clang, MSVC), and build types (Debug, Release), with the package manager’s cache restored between runs. A bug that only surfaces under MSVC’s standard library, or only in an optimized Release build, is found by the matrix and nowhere else.

The other C++-specific lever is the sanitizer. AddressSanitizer and UndefinedBehaviorSanitizer instrument the binary at build time to catch memory errors and undefined behavior — use-after-free, buffer overruns, signed overflow, and some ODR violations — at the moment they happen, with a stack trace, instead of as a mysterious crash three functions later. They cost runtime performance, so you run them in a dedicated CI job; but in a language whose failure modes are this subtle, a Debug build with -fsanitize=address,undefined running your test suite is among the highest-value CI you can add, turning “works until it doesn’t” bugs into deterministic, located failures.

War story: the crash that linked cleanly

A team chased a phantom for a week. Their service linked without a warning, passed unit tests, and then corrupted memory at random under load — never the same place twice, never reproducible in a debugger. The cause was an ODR violation hiding in plain sight. One static library in the dependency graph had been built with -D_GLIBCXX_DEBUG, a flag that changes the memory layout of standard containers by adding debug bookkeeping; the rest of the program was built without it. Both halves included the same <vector> header and agreed at the source level on what a std::vector was — but compiled with different flags, they disagreed on its actual size and field offsets. The linker cannot check the ODR; it matches symbol names, and the names matched perfectly, so it bound the mismatched halves into one binary. At runtime, one half wrote a vector’s fields at the offsets it believed in, and the other half read them at the offsets it believed in. Silent corruption.

There was no fix to find in the code, because the code was correct. The fix was in the build: rebuild the entire dependency graph with one consistent toolchain and one consistent set of flags. The lesson generalizes — it is why you never mix package managers, never link a Debug library into a Release binary, and pin your dependencies. ODR violations are the rare bug class where compiler and linker are no help at all; build consistency is the only defense, and a sanitizer build in CI the only early warning.


Practical exercise

Difficulty: Level I · Level II · Level III

  1. Level I — Build a small project with modern CMake. Take three or four source files plus a header or two and author a CMakeLists.txt from scratch in the target-based style: an add_library for the shared code, an add_executable for the entry point, target_link_libraries to connect them, and include directories and compile features with correct PUBLIC/PRIVATE visibility. Configure with Ninja and build. Then deliberately omit one .cpp from the library target, rebuild, and read the linker’s “undefined reference” — explain, in terms of the compile/link split, why compilation succeeded but linking failed.

  2. Level II — Add a reproducible external dependency. Add a real third-party library (fmt or spdlog) two ways and compare: first via FetchContent with a pinned git tag, then via a package manager (vcpkg with a vcpkg.json, or Conan). Pin every version and verify that a clean checkout on a different machine — or a fresh container — builds without manually installing anything. Note what “reproducible” cost you in each approach and when you would reach for which.

  3. Level III — Diagnose and fix a slow build. Take a project large enough to feel (your own, or a mid-sized open-source library), make a clean build, and record the wall-clock time. Change one widely-included header trivially — add an unused struct field — rebuild, and measure how much recompiles. Identify the header coupling causing the cascade and apply at least two fixes: forward declarations or pimpl to cut coupling, precompiled headers for the heavy standard includes, and ccache for repeated compilations. Re-run both measurements and report the before/after for the clean build and the one-line-change rebuild, attributing each improvement to its fix.

Summary

C++ has no built-in build tool or package registry, and that absence — not the language itself — is what makes C++ builds hard. The language splits source into a program in two phases that barely know each other: compilation turns each translation unit into an object file in isolation, and linking stitches the objects and external libraries together by matching symbols. The two are bridged by the textual #include model and governed by the One Definition Rule, and almost every build pain traces back there — undefined references when a definition never reaches the linker, forty-minute rebuilds when a shared header changes, and the worst bugs in the language when two units disagree on a shared definition. The ecosystem’s answer is CMake, which lets you describe targets and their dependency graph and generates the build, plus package managers (vcpkg, Conan) and FetchContent for declarative, reproducible dependencies. Build speed is a function of include hygiene first, tooling (PCH, ccache, Ninja, eventually C++20 modules) second.

Key takeaways

  • Compilation is per-translation-unit and isolated; linking resolves symbols across all of them. “Undefined reference” is a link-time failure to find a definition, not a compile error.
  • The #include model is textual copy-paste, so a header is recompiled in every unit that includes it — which is why one header change can cascade into a long rebuild. Include less: forward declarations and pimpl.
  • The One Definition Rule must hold across the whole program; the worst C++ bugs come from units that disagree on a shared definition (usually inconsistent flags or mixed dependency sources), and they link cleanly before they corrupt at runtime.
  • Modern CMake is target-based: attach properties to targets with correct PUBLIC/PRIVATE/INTERFACE visibility, never global commands. Configure once, build repeatedly.
  • Make dependencies reproducible — FetchContent for a few pinned libraries, a package manager for a real graph — and never mix package managers in one project.

Connections to other chapters

  • C++: Fundamentals (prerequisite): establishes what translation units, headers, and the standard library are; this chapter is the machinery that turns them into a binary. The RAII and value-semantics ideas you compile there only run because the compile-and-link pipeline here assembled them correctly.
  • C++: Modern C++ (extension): C++20 modules are the structural answer to the #include/rebuild problem PCH and ccache only mitigate. The mechanics of import live there; their motivation is the build cost analyzed here.
  • Go: Packages & Modules and TypeScript: The Node Ecosystem (contrast): both ship their dependency and build story in the boxgo build and the module graph, npm and package.json — where C++ bolts CMake and vcpkg on after the fact. Reading them against this chapter shows what an integrated toolchain buys you, and what C++’s separate-compilation model costs to manage by hand.
  • Containerization with Docker (Part V, extension): a container makes a C++ build genuinely reproducible — pin the compiler, standard library, and package manager inside the image so “it builds on my machine” becomes “it builds in this image, everywhere.” The multi-stage build is the natural fit: a heavy builder stage compiles the binary, and a slim runtime stage ships only that binary, discarding the SDK the running program never needs.

Further reading

Essential

  • Craig Scott, Professional CMake: A Practical Guide — the definitive, current reference on target-based CMake; comprehensive and opinionated in the right directions.
  • Henry Schreiner et al., An Introduction to Modern CMake (cliutils.gitlab.io) — a free, well-paced online guide to the target-based style this chapter teaches.

Deep dives

  • The official CMake documentation (cmake.org) — the authoritative reference for commands, generator expressions, and find_package semantics.
  • The vcpkg (vcpkg.io) and Conan (docs.conan.io) documentation — manifest mode, toolchain integration, and binary caching, the practical core of reproducible dependency management.

Historical context

  • The history of C++ build tooling — from hand-written Makefiles and Autotools through CMake’s rise to dominance — explains why the ecosystem looks the way it does and why no standard build tool ever shipped with the language.
  • The C++20 modules proposals and design rationale — the standards-committee motivation for replacing the textual #include model, which is the long-term answer to the rebuild-cascade problem at the heart of this chapter.