Python: Microservices

Keywords

microservices, service boundaries, inter-service communication, api gateway, resilience, circuit breaker, distributed systems, monolith, retries, timeouts

Introduction

The team had read the same blog posts everyone read that year, and the conclusion felt obvious: the monolith was the problem. Deploys were scary, and two squads kept stepping on each other in one codebase. So they split it — early, enthusiastically, into a dozen services drawn along the org chart and the existing folder structure. For a few weeks it felt like progress. Then the bills came due, one at a time.

The first was velocity. A feature that used to be a single pull request now touched five repositories: a schema change here, a new endpoint there, client updates in three callers, a coordinated deploy across all of them in the right order. A refactor the compiler used to catch in one place was now a migration — versioned, backward-compatible, rolled out service by service over a week. The second bill was operational. One Tuesday the recommendations service got slow — not down, just slow, a query that had quietly grown to take eight seconds. The product service called it synchronously, with no timeout, on every page load. Those calls piled up, each one holding a worker hostage waiting for a reply that never came in time. Within ninety seconds the product service had no free workers, so its callers started timing out, and the checkout service behind them, and the gateway in front of all of it. A single slow query in a feature nobody would miss had become a full-site outage, because the failure had nowhere to stop.

What the team had built was not microservices. It was a distributed monolith: all the coupling of a single program, now spread across a network that could fail in the middle of a function call. They had paid the entire price of distribution and collected almost none of the benefit. This chapter is about the decision they got wrong and the disciplines that would have saved them — how to decide whether to split at all, where to draw the seams, how services should talk, and, above all, how to make a system of services survive the fact that the network between them is hostile.

The Core Insight

A monolith runs in one process. When one part of it calls another, the call is a function call: it cannot half-succeed, it cannot time out, it cannot find the other half temporarily unreachable. The arguments are real objects, the types are checked at build time, and a transaction can wrap the whole thing so it either all happens or none of it does. This is in-process simplicity, and it is worth far more than it gets credit for. Most of what makes a monolith pleasant is that the hardest problems in computing — partial failure, distributed consistency, network latency — simply do not exist inside one address space.

Microservices trade that simplicity away, deliberately, for two things a monolith cannot give you:

Independent deployability. Each service ships on its own cadence, owned by its own team, without a coordinated all-or-nothing release. At a large enough organization this is the difference between deploying daily and deploying monthly.
Independent scalability. You scale the component under load — the ML inference service to forty replicas, the CRUD service to two — instead of cloning the entire application to handle pressure on one part of it.

Those are real and valuable. But the bill for them is exact and non-negotiable: you inherit the full weight of distributed-systems complexity. The network is unreliable — calls drop, stall, and arrive twice. Failures become partial: A is up, B is down, and a request spanning both is in an undefined state with no transaction to undo it. Consistency stops being free, because no database transaction can span two services. The insight is that microservices are not an architecture you reach for to be “modern.” They are an organizational and operational tool with a steep, permanent tax, adopted only when the value of independent deploy-and-scale exceeds that tax. For most teams most of the time, a well-structured monolith is the correct starting point — and the right move is to extract services later, along seams production has already revealed, rather than guessing at boundaries on day one.

A mental model

Think of a monolith as one team in a single room. When someone needs something, they turn around and ask. Knowledge is shared, coordination is a conversation, and a change that affects three people happens in one meeting. It is fast and intimate, and it stops scaling when the room gets too crowded to hear yourself think.

Microservices are that same organization split into autonomous departments, each with its own staff, its own filing cabinet, and its own door. Departments do not reach into each other’s filing cabinets — that is the database-per-service rule, and it is what makes them independent. They communicate only through published contracts: a request form, an API, an event posted to a shared board. A department can reorganize its internals freely as long as it keeps honoring the contract at its door. That autonomy is the point. But it comes with the realities of any organization that talks through paperwork: messages get lost, replies are slow, and you sometimes have to act without knowing whether your last request was even received.

The other half of the model is the boundary itself. Inside a process, a call is safe; across the network, the boundary is hostile territory. Every synchronous call is a small expedition into a place where the other side might be slow, broken, or gone, and where your request might be delivered zero, one, or several times. You do not make that expedition unarmed. You go with a timeout so you don’t wait forever, a retry so a single dropped packet isn’t fatal, and a circuit breaker so you stop knocking on a door that clearly isn’t going to open. Most of the craft of microservices is learning to defend that boundary.

When to go microservices (and when a monolith wins)

The honest framework starts from a presumption against splitting, and asks what would justify it. Figure 11.1 shows the topology you are signing up to operate; the decision is whether that complexity buys you anything.

Split into services when independent deployability or scalability is a concrete, present pain: when multiple teams collide in a shared codebase and the coordination cost is real; when components have genuinely different scaling profiles and cloning the whole app to scale one part wastes money; when parts of the system have clear, stable bounded contexts already proven through months of production use. These are the conditions under which the distribution tax pays for itself.

Stay a monolith — ideally a modular one — when the team is small (coordination is a conversation, not a protocol), when the domain is still unclear (any boundary you draw is a guess you’ll regret), when you need strong consistency (in-process ACID is trivial; distributed sagas are not), or when DevOps maturity isn’t there yet (microservices demand CI/CD, observability, and on-call as table stakes). The load-bearing rule beneath all of this is monolith-first: you can always extract a well-designed module into a service once production shows you the boundary, but you cannot easily un-split a bad one — that means merging two deployed systems and their data, far harder than the split was. When in doubt, keep it in the monolith and invest in module design.

What you’ll learn

How to decide whether to adopt microservices at all, using independent deploy-and-scale as the test and “monolith-first” as the default
Where to draw service boundaries — along business domains, not technical layers — and how to extract services from proven seams rather than guessing
When to communicate synchronously (REST, gRPC) versus asynchronously (events, messaging), and how that choice changes coupling and resilience
What an API gateway is for — the single front door that handles routing, auth, and rate limiting so individual services don’t each reinvent them
How to defend a service against partial failure with timeouts, retries with backoff and jitter, circuit breakers, and bulkheads — and how those patterns stop a cascade
Why distributed consistency forces sagas and idempotency on you, and why distributed tracing stops being optional the moment you go distributed

Prerequisites

Python: Web Development — every service in this chapter is a single web service. We assume you can already build one: request lifecycle, validation with Pydantic, dependency injection, running it under an ASGI server. This chapter does not re-teach any of that; it is about what happens when you have many of them.
Python: Design Patterns — the Dependency Rule and the idea of seams. The same discipline that lets you swap a database behind an interface is what lets you later lift a module out into its own service.
Comfort with HTTP and async Python (async/await), since cross-service calls are almost always I/O-bound and concurrent.

Service boundaries and decomposition

The single most consequential decision in a microservice system is where the boundaries go, because boundaries are the one thing that is brutally expensive to change later. Drawn well, a service is a small, autonomous department that can evolve on its own. Drawn badly, your services are just a monolith’s modules with network latency bolted between them — and you have made everything harder while making nothing better.

The cardinal mistake is to split along technical layers: a “controller service,” a “business-logic service,” a “data-access service.” It feels tidy because it matches how the code is organized, and it is exactly wrong. A single user-facing feature — “place an order” — now traverses all three services, so every feature change touches every service and the services are useless without each other. You have maximized coupling across boundaries, the opposite of what boundaries are for.

The right axis is the business domain. A service should own one bounded context — Orders, Payments, Inventory, Users — chosen so that a change to a business capability lands inside a single service most of the time. The test is cohesion: things that change together should live together. A well-drawn Order service can change how it stores or prices an order without anyone noticing, because the only thing it exposes is its contract. Three rules make a boundary real rather than cosmetic: single responsibility (one reason to change), owning your data (the service’s database is private — no other service reads its tables), and communicating only through published contracts (APIs and events, never shared internals). The database-per-service rule is the one people break first and regret most. The moment two services read the same table they are coupled through its schema; a column rename becomes a cross-team migration, and you have rebuilt the distributed monolith through the back door.

How do you find these boundaries? Not on a whiteboard up front, where they are guesses. You find them by running a modular monolith and watching where the natural seams fall: which modules change together, which scale together, which one team would happily own. A seam that has stayed stable for months, that one team would own end to end, and that has a narrow interface to the rest of the system, is a seam you can extract with confidence. This is the seam-extraction approach, and it is why monolith-first is not timidity — it is how you earn the information you need to draw boundaries you won’t have to redraw.

Inter-service communication

Once you have more than one service, they have to talk, and the first real fork in the road is how. There are two families, and the choice is not a detail — it determines how tightly your services are coupled and how gracefully the system degrades when one of them is unhealthy.

Synchronous communication is request/response: A calls B and waits for the answer before continuing. Over HTTP/REST it is universal, easy to debug with curl, and human-readable; over gRPC it is faster and strongly typed, using Protocol Buffers as a binary contract over HTTP/2 — the usual choice for chatty internal calls where performance matters and a browser is never the client. Either way, the defining property is the wait. Synchronous calls give you a simple mental model and immediate consistency, but they create temporal coupling: A cannot make progress unless B is up and answering right now. That is the property that turned one slow service into a site-wide outage in the opening story. Every synchronous edge is a thread by which one service’s bad day can be pulled into another’s.

A light illustrative client shows the shape — note that the resilience knobs, not the happy path, are the interesting part:

import httpx

async def get_user(user_id: int) -> dict:
    """Fetch a user from the User service over its published HTTP contract.

    The timeout is the load-bearing argument: without it, a slow User
    service would hold this coroutine — and its caller — open indefinitely.
    """
    async with httpx.AsyncClient(timeout=2.0) as client:
        resp = await client.get(f"http://user-service:8000/api/v1/users/{user_id}")
        resp.raise_for_status()
        return resp.json()

Asynchronous communication breaks the wait. Instead of calling B directly, A publishes an event — “OrderCreated” — to a message bus and moves on; B (and C, and D) consume it later, on their own schedule. The producer does not know or care who is listening, which is the deepest decoupling available: A has no dependency on B being up, and you can add a fourth consumer without A ever changing. This is what makes event-driven systems resilient — a consumer can be down for an hour and catch up when it returns — and naturally scalable, since you add throughput by adding consumers. The cost is that you trade immediate consistency for eventual consistency and an easy-to-follow call stack for an asynchronous flow that is genuinely harder to debug. A useful default: queries that need an answer now are synchronous; state-change notifications that others merely need to know about — “this happened” — should almost always be events.

The lesson underneath both is that coupling is the thing you are managing. Synchronous calls couple in time; shared databases couple in schema; even events couple you to their payload format. The goal is never zero coupling — services have to cooperate — but to make the coupling explicit, narrow, and at the contract, where it can be versioned and defended.

The API gateway

If every client — browser, mobile app, partner integration — had to know the address of every service, call each one directly, and implement authentication, rate limiting, and TLS against each, you would have leaked your entire internal topology to the outside world and duplicated your cross-cutting concerns across every service. The API gateway is the answer: a single front door that sits between clients and the fleet, as shown in Figure 11.1.

The gateway handles, in one place, the concerns every service would otherwise reimplement: routing (mapping /orders to the Order service and /users to the User service, so clients see one coherent API instead of a dozen hostnames), authentication (verifying the caller’s token once, at the edge, so downstream services can trust an already-authenticated request), rate limiting (shedding abusive traffic before it reaches your services), and TLS termination. It is the seam between the messy outside world and your internal network. Because it is a single chokepoint it must be kept thin and well-tested — a bug or outage there takes down everything behind it. The gateway routes and enforces; it should not contain business logic, or you have invented a new monolith at the worst possible layer.

Resilience: defending against partial failure

This is the section that separates microservices that work from the distributed monolith that takes down your site. Inside a process, a function call cannot fail partway through. Across the network, every call can: it can be slow, fail outright, or fail for one request while succeeding for the next. The discipline of resilience is the set of defenses you put on each synchronous edge so one service’s failure stays contained instead of cascading through everyone who depends on it. There are four patterns, and they layer.

The first and most important is the timeout. A call with no timeout is the single most dangerous line of code in a distributed system, because a slow dependency doesn’t just fail your one request — it holds your worker (a thread, a connection, a coroutine slot) hostage for as long as the dependency stays slow. Under load those held workers accumulate until you have none left, and now you are down, not because you broke but because something downstream did. A timeout converts “hang forever” into “fail fast” — the precondition for every other pattern, since you cannot retry, fall back, or open a circuit on a call that never returns.

A timeout alone is brittle, though: it turns every transient blip into a hard failure. The fix is the retry, but retries are sharper than they look. You must retry only transient errors (a timeout, a connection reset, a 503) and never a 400 or 404, which will fail identically every time. You must cap the attempts. And you must space them with exponential backoff plus jitter: each retry waits longer than the last (1s, 2s, 4s) so you aren’t hammering a struggling service, and the jitter adds randomness so a thousand clients that failed at the same instant don’t all retry at the same instant — the thundering herd that retries can cause is its own outage. The tenacity library expresses this cleanly:

from tenacity import retry, stop_after_attempt, wait_exponential_jitter, retry_if_exception_type
import httpx

@retry(
    stop=stop_after_attempt(3),
    wait=wait_exponential_jitter(initial=1, max=10),   # backoff + jitter, not a fixed delay
    retry=retry_if_exception_type((httpx.TimeoutException, httpx.NetworkError)),
)
async def fetch_quote(client: httpx.AsyncClient, url: str) -> dict:
    """Retry only transient failures; let 4xx errors fail immediately."""
    resp = await client.get(url, timeout=2.0)
    resp.raise_for_status()
    return resp.json()

There is a trap inside retries: they are only safe on idempotent operations. A GET is naturally idempotent — fetching twice is harmless. But retrying a “charge the customer” call can charge them twice, which is why state-changing operations need an idempotency key (covered below) before you are allowed to retry them at all.

Retries handle a blip. They make things worse for a sustained outage: if a service has been down for two minutes, retrying every call three times just triples the load on the dead service and ties up three times as many of your own workers waiting. This is where the circuit breaker comes in, and it is the centerpiece pattern. A circuit breaker wraps a downstream call and watches its failure rate. While calls succeed it is closed and traffic flows normally. Once failures cross a threshold — say five in a row — it opens, and from then on calls fail instantly, without even attempting the network, for a cooldown period. After the cooldown it goes half-open, letting a single trial call through: if it succeeds the circuit closes; if it fails the circuit opens again. The breaker is what stops the cascade. Instead of a thousand doomed calls piling up against a dead service and exhausting your workers, the breaker trips and those calls return immediately — ideally with a fallback (cached data, a default, an empty list) so the user sees graceful degradation rather than an error. In Figure 11.1 the Recommendation service is failing, and the open breaker on that edge is why the Order service keeps serving: recommendations come back empty instead of taking checkout down with them.

The fourth pattern is the bulkhead, named for the watertight compartments that keep one breached section of a ship’s hull from flooding the whole vessel. The idea is to partition your resources per dependency so one slow downstream can only exhaust its own slice. Give the Recommendation client a pool of at most ten concurrent calls (an asyncio.Semaphore is enough), and even if all ten hang, your calls to Payments and Users — in their own pools — are untouched. Without bulkheads a single slow dependency drains a shared connection pool and starves every other call in the process; with them the damage is fenced into one compartment. Together these four patterns — timeout, retry-with-backoff-and-jitter, circuit breaker, bulkhead — are how you make a synchronous edge survivable.

War story: the missing timeout that became a site-wide outage

A retailer’s product-page service called its recommendations service on every page load — synchronously, to render a “you might also like” carousel, a feature nobody would miss if it vanished. The call had no timeout. One afternoon a slow query made the recommendations service take eight seconds to respond instead of eighty milliseconds. The product service didn’t crash; it did something worse. Each incoming request made the eight-second call and held its worker the whole time, so within ninety seconds every worker in the product-service pool was blocked waiting on recommendations. New requests queued, then timed out at the load balancer. The services in front of product — checkout, the gateway — started failing too, because their synchronous calls to product now hung. A non-critical carousel had taken down the entire site. The fix was three lines that should have been there from the start: a one-second timeout on the call, a circuit breaker so repeated slow calls stopped being attempted, and a fallback that rendered the page with an empty carousel. The lesson is blunt — a synchronous call without a timeout is a latent outage, and the blast radius of a missing timeout is never the one feature you forgot to protect; it is everything upstream of it.

Build it → These resilience patterns in a real multi-service stack: Project 02: Microservice Platform runs a fleet of gRPC services behind a Kong API gateway — the gateway, routing, and service-to-service calls of this chapter at production shape — and Project 29: Model Routing Layer is a gateway/routing tier with timeouts, retries, and breakers on flaky downstream model backends.

Distributed concerns in brief

Three more realities arrive with distribution. Each deserves its own treatment, but you should know their shape so you recognize them before they bite.

Consistency becomes a saga. Because no transaction can span two services, a business operation touching several of them — create order, reserve payment, reserve inventory — cannot be wrapped in one ACID transaction the way it could in a monolith. The saga pattern replaces it with a sequence of local transactions, each with a compensating transaction that undoes it if a later step fails: if inventory can’t be reserved, you run “release the payment” and “cancel the order” to walk the system back to consistency. Sagas come in two flavors — choreography (services react to each other’s events, decentralized but hard to trace) and orchestration (a central coordinator drives the steps, easier to follow but a single point of control). They are powerful and genuinely complex, which is itself an argument for not splitting until you must: a monolith gets this for free.

Idempotency is mandatory, not optional. The network can deliver a message zero, one, or many times, and retries actively create duplicates. Every state-changing operation must therefore be safe to apply twice. The standard tool is the idempotency key: the caller attaches a unique key, the server records “I already processed this key” with its result, and a duplicate request returns the stored result instead of charging the card again. Without idempotency, the retries you added for resilience become a correctness bug.

Distributed tracing is non-negotiable. In a monolith, a stack trace tells you the whole story of a request. Across services that story shatters into a dozen log files on a dozen machines, and “why was this one request slow?” becomes unanswerable unless you planned for it. The minimum is a correlation ID generated at the gateway and propagated through every downstream call and event, so you can grep one request’s entire journey; the mature version is full distributed tracing (OpenTelemetry and the like). The moment you go distributed, observability is the only way you will ever debug the system again — which is why it has a chapter of its own.

Practical exercise

Difficulty: Level I · Level II · Level III

Level I — Split and find the new failure modes. Take a small monolith with one endpoint that does two things in-process (e.g. an order endpoint that also looks up the user record). Split it into two services that talk over HTTP, with the order service calling the user service. Get it working, then deliberately make the user service slow (add a sleep) and then stop it entirely. Write down every new failure mode the split introduced that did not exist in the monolith — the hang, the partial failure, the dependency on startup order — and note which line of in-process code each one replaced.
Level II — Defend the edge and prove it contains a cascade. Add a timeout, a retry with exponential backoff and jitter, and a circuit breaker to that flaky downstream call, with a fallback when the circuit is open. Then run a small load test against the order service while the user service is down. Demonstrate, with numbers, that the breaker keeps the order service responsive (failing fast with the fallback) instead of exhausting its workers and going down too. Explain what each of the three defenses contributed and what would have happened with any one of them missing.
Level III — Design a decomposition and defend it. Given a described system (say, a food-delivery app: users, restaurants, orders, payments, delivery tracking, notifications), produce a decomposition. Draw the service boundaries along domain seams and justify each one against the single-responsibility and own-your-data rules. For every edge, choose synchronous or asynchronous communication and say why. Specify the gateway’s responsibilities and a concrete resilience policy (timeouts, retry budgets, breaker thresholds, where bulkheads go). Then make the hardest argument of all: state whether you would actually split this system at all at the given scale, or keep it a modular monolith — and defend your answer with the monolith-first rule.

Summary

Microservices trade the in-process simplicity of a monolith — no partial failure, no network, free transactions — for two specific gains: independent deployability and independent scalability. That trade carries a permanent tax of distributed-systems complexity, so it is an organizational and operational decision, not a default, and a well-structured monolith is the right place to start. When you do split, you draw boundaries along business domains rather than technical layers, give each service its own private database, and let it expose only a published contract — extracting services from seams that production has already proven rather than guessing at them up front. Services talk synchronously when they need an answer now and asynchronously, through events, when they merely need to broadcast that something happened; the choice is really a choice about coupling. An API gateway gives the fleet one front door for routing, auth, and rate limiting. And because the network is hostile, every synchronous edge must be defended — with timeouts, retries that back off and jitter, circuit breakers, and bulkheads — so that one service’s failure is contained instead of cascading into the outage that defines the distributed monolith.

Key takeaways

Microservices buy independent deploy-and-scale at the price of distributed-systems complexity; if you can’t name the deploy/scale/team-autonomy win, stay a monolith.
Boundaries are the expensive decision: split along business domains, give each service its own data, and extract from proven seams — never split along technical layers.
Synchronous coupling is coupling in time; one slow service can stall everyone who calls it synchronously. Events decouple in time at the cost of eventual consistency.
A synchronous call without a timeout is a latent site-wide outage; the timeout is the precondition for every other resilience pattern.
The circuit breaker is what stops a cascade — it fails fast against a dead dependency instead of letting doomed calls exhaust your workers; pair it with a fallback for graceful degradation.
Distribution forces sagas for consistency, idempotency keys for correctness under retries, and distributed tracing for debuggability — none of them optional once you go distributed.

Connections to other chapters

Python: Web Development (prerequisite): each service in this chapter is exactly one of the web services that chapter teaches you to build — the request lifecycle, validation, and dependency injection are assumed here. This chapter is what happens when you have many of them and they have to cooperate over a network.
Python: Design Patterns (prerequisite): the Dependency Rule and the discipline of seams are what make a service extractable in the first place. A module hidden behind a clean interface can be lifted out into a service; a tangle of cross-references cannot. Good boundaries in the monolith are the raw material for good services.
Orchestration with Kubernetes (extension): this chapter is about designing a fleet of services; Kubernetes is how a fleet is actually run — scheduling each service’s containers across many hosts, restarting the ones that die, rolling out new versions, and providing the service discovery that lets user-service:8000 resolve at all. The resilience patterns here and the self-healing there are complementary layers of the same goal.
Observability (extension): the correlation IDs and distributed tracing mentioned in passing here are a discipline in their own right. The moment you go distributed, debugging requires tracing — a single stack trace no longer exists — which is why observability is non-optional for any real microservice system, and gets its own treatment.

Sam Newman, Building Microservices (2nd ed., O’Reilly) — the canonical, balanced treatment of boundaries, communication, and the organizational realities of splitting.
Martin Fowler, “MonolithFirst” — the short, sharp argument for starting with a monolith and extracting services from proven seams, and the failure mode of premature decomposition.

Deep dives

Michael Nygard, Release It! (2nd ed.) — the source text for the resilience patterns in this chapter: timeouts, circuit breakers, bulkheads, and the failure modes they defend against, with hard-won production stories.
Martin Fowler, “Microservices” — the foundational article defining the style, its characteristics, and its tradeoffs against the monolith.

Historical context

Garcia-Molina and Salem, “Sagas” (SIGMOD, 1987) — the original paper that introduced the saga as an alternative to long-lived distributed transactions, decades before microservices made it standard practice.

--- title: "Python: Microservices" keywords: [microservices, service boundaries, inter-service communication, api gateway, resilience, circuit breaker, distributed systems, monolith, retries, timeouts] difficulty: advanced prerequisites: [python-web-development, python-design-patterns] estimated_time: "4-5 hours" --- ## Introduction The team had read the same blog posts everyone read that year, and the conclusion felt obvious: the monolith was the problem. Deploys were scary, and two squads kept stepping on each other in one codebase. So they split it — early, enthusiastically, into a dozen services drawn along the org chart and the existing folder structure. For a few weeks it felt like progress. Then the bills came due, one at a time. The first was velocity. A feature that used to be a single pull request now touched five repositories: a schema change here, a new endpoint there, client updates in three callers, a coordinated deploy across all of them in the right order. A refactor the compiler used to catch in one place was now a *migration* — versioned, backward-compatible, rolled out service by service over a week. The second bill was operational. One Tuesday the recommendations service got slow — not down, just slow, a query that had quietly grown to take eight seconds. The product service called it synchronously, with no timeout, on every page load. Those calls piled up, each one holding a worker hostage waiting for a reply that never came in time. Within ninety seconds the product service had no free workers, so *its* callers started timing out, and the checkout service behind them, and the gateway in front of all of it. A single slow query in a feature nobody would miss had become a full-site outage, because the failure had nowhere to stop. What the team had built was not microservices. It was a **distributed monolith**: all the coupling of a single program, now spread across a network that could fail in the middle of a function call. They had paid the entire price of distribution and collected almost none of the benefit. This chapter is about the decision they got wrong and the disciplines that would have saved them — how to decide whether to split at all, where to draw the seams, how services should talk, and, above all, how to make a system of services survive the fact that the network between them is hostile. ### The Core Insight A monolith runs in one process. When one part of it calls another, the call is a function call: it cannot half-succeed, it cannot time out, it cannot find the other half temporarily unreachable. The arguments are real objects, the types are checked at build time, and a transaction can wrap the whole thing so it either all happens or none of it does. This is *in-process simplicity*, and it is worth far more than it gets credit for. Most of what makes a monolith pleasant is that the hardest problems in computing — partial failure, distributed consistency, network latency — simply do not exist inside one address space. Microservices trade that simplicity away, deliberately, for two things a monolith cannot give you: 1. **Independent deployability.** Each service ships on its own cadence, owned by its own team, without a coordinated all-or-nothing release. At a large enough organization this is the difference between deploying daily and deploying monthly. 2. **Independent scalability.** You scale the component under load — the ML inference service to forty replicas, the CRUD service to two — instead of cloning the entire application to handle pressure on one part of it. Those are real and valuable. But the bill for them is exact and non-negotiable: you inherit the full weight of distributed-systems complexity. The network is unreliable — calls drop, stall, and arrive twice. Failures become *partial*: A is up, B is down, and a request spanning both is in an undefined state with no transaction to undo it. Consistency stops being free, because no database transaction can span two services. The insight is that microservices are not an architecture you reach for to be "modern." They are an *organizational and operational* tool with a steep, permanent tax, adopted only when the value of independent deploy-and-scale exceeds that tax. For most teams most of the time, a well-structured monolith is the correct starting point — and the right move is to *extract* services later, along seams production has already revealed, rather than guessing at boundaries on day one. ### A mental model Think of a monolith as one team in a single room. When someone needs something, they turn around and ask. Knowledge is shared, coordination is a conversation, and a change that affects three people happens in one meeting. It is fast and intimate, and it stops scaling when the room gets too crowded to hear yourself think. Microservices are that same organization split into **autonomous departments**, each with its own staff, its own filing cabinet, and its own door. Departments do not reach into each other's filing cabinets — that is the database-per-service rule, and it is what makes them independent. They communicate only through **published contracts**: a request form, an API, an event posted to a shared board. A department can reorganize its internals freely as long as it keeps honoring the contract at its door. That autonomy is the point. But it comes with the realities of any organization that talks through paperwork: messages get lost, replies are slow, and you sometimes have to act without knowing whether your last request was even received. The other half of the model is the boundary itself. Inside a process, a call is safe; across the network, **the boundary is hostile territory**. Every synchronous call is a small expedition into a place where the other side might be slow, broken, or gone, and where your request might be delivered zero, one, or several times. You do not make that expedition unarmed. You go with a *timeout* so you don't wait forever, a *retry* so a single dropped packet isn't fatal, and a *circuit breaker* so you stop knocking on a door that clearly isn't going to open. Most of the craft of microservices is learning to defend that boundary. ### When to go microservices (and when a monolith wins) The honest framework starts from a presumption *against* splitting, and asks what would justify it. @fig-microservices shows the topology you are signing up to operate; the decision is whether that complexity buys you anything. **Split into services** when independent deployability or scalability is a concrete, present pain: when *multiple teams* collide in a shared codebase and the coordination cost is real; when *components have genuinely different scaling profiles* and cloning the whole app to scale one part wastes money; when parts of the system have *clear, stable bounded contexts* already proven through months of production use. These are the conditions under which the distribution tax pays for itself. **Stay a monolith** — ideally a *modular* one — when the team is small (coordination is a conversation, not a protocol), when the domain is still unclear (any boundary you draw is a guess you'll regret), when you need *strong consistency* (in-process ACID is trivial; distributed sagas are not), or when DevOps maturity isn't there yet (microservices demand CI/CD, observability, and on-call as table stakes). The load-bearing rule beneath all of this is **monolith-first**: you can always extract a well-designed module into a service once production shows you the boundary, but you cannot easily *un-split* a bad one — that means merging two deployed systems and their data, far harder than the split was. When in doubt, keep it in the monolith and invest in module design. ### What you'll learn - How to decide whether to adopt microservices at all, using independent deploy-and-scale as the test and "monolith-first" as the default - Where to draw service boundaries — along business domains, not technical layers — and how to extract services from proven seams rather than guessing - When to communicate synchronously (REST, gRPC) versus asynchronously (events, messaging), and how that choice changes coupling and resilience - What an API gateway is for — the single front door that handles routing, auth, and rate limiting so individual services don't each reinvent them - How to defend a service against partial failure with timeouts, retries with backoff and jitter, circuit breakers, and bulkheads — and how those patterns stop a cascade - Why distributed consistency forces sagas and idempotency on you, and why distributed tracing stops being optional the moment you go distributed ### Prerequisites - **Python: Web Development** — every service in this chapter *is* a single web service. We assume you can already build one: request lifecycle, validation with Pydantic, dependency injection, running it under an ASGI server. This chapter does not re-teach any of that; it is about what happens when you have *many* of them. - **Python: Design Patterns** — the Dependency Rule and the idea of seams. The same discipline that lets you swap a database behind an interface is what lets you later lift a module out into its own service. - Comfort with HTTP and async Python (`async`/`await`), since cross-service calls are almost always I/O-bound and concurrent. --- ## Service boundaries and decomposition The single most consequential decision in a microservice system is where the boundaries go, because boundaries are the one thing that is brutally expensive to change later. Drawn well, a service is a small, autonomous department that can evolve on its own. Drawn badly, your services are just a monolith's modules with network latency bolted between them — and you have made everything harder while making nothing better. The cardinal mistake is to split along **technical layers**: a "controller service," a "business-logic service," a "data-access service." It feels tidy because it matches how the code is organized, and it is exactly wrong. A single user-facing feature — "place an order" — now traverses all three services, so every feature change touches every service and the services are useless without each other. You have maximized coupling across boundaries, the opposite of what boundaries are for. The right axis is the **business domain**. A service should own one bounded context — Orders, Payments, Inventory, Users — chosen so that a change to a business capability lands inside a *single* service most of the time. The test is cohesion: things that change together should live together. A well-drawn Order service can change how it stores or prices an order without anyone noticing, because the only thing it exposes is its contract. Three rules make a boundary real rather than cosmetic: **single responsibility** (one reason to change), **owning your data** (the service's database is private — no other service reads its tables), and **communicating only through published contracts** (APIs and events, never shared internals). The database-per-service rule is the one people break first and regret most. The moment two services read the same table they are coupled through its schema; a column rename becomes a cross-team migration, and you have rebuilt the distributed monolith through the back door. How do you *find* these boundaries? Not on a whiteboard up front, where they are guesses. You find them by running a modular monolith and watching where the natural seams fall: which modules change together, which scale together, which one team would happily own. A seam that has stayed stable for months, that one team would own end to end, and that has a narrow interface to the rest of the system, is a seam you can extract with confidence. This is the **seam-extraction** approach, and it is why monolith-first is not timidity — it is how you earn the information you need to draw boundaries you won't have to redraw. ## Inter-service communication Once you have more than one service, they have to talk, and the first real fork in the road is *how*. There are two families, and the choice is not a detail — it determines how tightly your services are coupled and how gracefully the system degrades when one of them is unhealthy. **Synchronous** communication is request/response: A calls B and waits for the answer before continuing. Over HTTP/REST it is universal, easy to debug with `curl`, and human-readable; over **gRPC** it is faster and strongly typed, using Protocol Buffers as a binary contract over HTTP/2 — the usual choice for chatty internal calls where performance matters and a browser is never the client. Either way, the defining property is the *wait*. Synchronous calls give you a simple mental model and immediate consistency, but they create **temporal coupling**: A cannot make progress unless B is up and answering *right now*. That is the property that turned one slow service into a site-wide outage in the opening story. Every synchronous edge is a thread by which one service's bad day can be pulled into another's. A light illustrative client shows the shape — note that the resilience knobs, not the happy path, are the interesting part: ```python import httpx async def get_user(user_id: int) -> dict: """Fetch a user from the User service over its published HTTP contract. The timeout is the load-bearing argument: without it, a slow User service would hold this coroutine — and its caller — open indefinitely. """ async with httpx.AsyncClient(timeout=2.0) as client: resp = await client.get(f"http://user-service:8000/api/v1/users/{user_id}") resp.raise_for_status() return resp.json() ``` **Asynchronous** communication breaks the wait. Instead of calling B directly, A publishes an *event* — "OrderCreated" — to a message bus and moves on; B (and C, and D) consume it later, on their own schedule. The producer does not know or care who is listening, which is the deepest decoupling available: A has no dependency on B being up, and you can add a fourth consumer without A ever changing. This is what makes event-driven systems resilient — a consumer can be down for an hour and catch up when it returns — and naturally scalable, since you add throughput by adding consumers. The cost is that you trade immediate consistency for **eventual consistency** and an easy-to-follow call stack for an asynchronous flow that is genuinely harder to debug. A useful default: queries that need an answer now are synchronous; state-change notifications that others merely need to *know about* — "this happened" — should almost always be events. The lesson underneath both is that **coupling is the thing you are managing**. Synchronous calls couple in time; shared databases couple in schema; even events couple you to their payload format. The goal is never zero coupling — services have to cooperate — but to make the coupling explicit, narrow, and at the contract, where it can be versioned and defended. ## The API gateway If every client — browser, mobile app, partner integration — had to know the address of every service, call each one directly, and implement authentication, rate limiting, and TLS against each, you would have leaked your entire internal topology to the outside world and duplicated your cross-cutting concerns across every service. The **API gateway** is the answer: a single front door that sits between clients and the fleet, as shown in @fig-microservices. ![A microservice topology: an API gateway fronts several independently-deployed services, each owning its data; synchronous calls carry timeouts, retries, and circuit breakers so one slow service is contained instead of cascading, while some edges go through asynchronous messaging.](../assets/diagrams/rendered/py_microservices.svg){#fig-microservices .lightbox} The gateway handles, in one place, the concerns every service would otherwise reimplement: **routing** (mapping `/orders` to the Order service and `/users` to the User service, so clients see one coherent API instead of a dozen hostnames), **authentication** (verifying the caller's token once, at the edge, so downstream services can trust an already-authenticated request), **rate limiting** (shedding abusive traffic before it reaches your services), and TLS termination. It is the seam between the messy outside world and your internal network. Because it is a single chokepoint it must be kept thin and well-tested — a bug or outage there takes down everything behind it. The gateway routes and enforces; it should not contain business logic, or you have invented a new monolith at the worst possible layer. ## Resilience: defending against partial failure This is the section that separates microservices that work from the distributed monolith that takes down your site. Inside a process, a function call cannot fail partway through. Across the network, *every* call can: it can be slow, fail outright, or fail for one request while succeeding for the next. The discipline of resilience is the set of defenses you put on each synchronous edge so one service's failure stays *contained* instead of cascading through everyone who depends on it. There are four patterns, and they layer. The first and most important is the **timeout**. A call with no timeout is the single most dangerous line of code in a distributed system, because a slow dependency doesn't just fail your one request — it holds your worker (a thread, a connection, a coroutine slot) hostage for as long as the dependency stays slow. Under load those held workers accumulate until you have none left, and now *you* are down, not because you broke but because something downstream did. A timeout converts "hang forever" into "fail fast" — the precondition for every other pattern, since you cannot retry, fall back, or open a circuit on a call that never returns. A timeout alone is brittle, though: it turns every transient blip into a hard failure. The fix is the **retry**, but retries are sharper than they look. You must retry only *transient* errors (a timeout, a connection reset, a 503) and never a 400 or 404, which will fail identically every time. You must cap the attempts. And you must space them with **exponential backoff plus jitter**: each retry waits longer than the last (1s, 2s, 4s) so you aren't hammering a struggling service, and the jitter adds randomness so a thousand clients that failed at the same instant don't all retry at the same instant — the *thundering herd* that retries can cause is its own outage. The `tenacity` library expresses this cleanly: ```python from tenacity import retry, stop_after_attempt, wait_exponential_jitter, retry_if_exception_type import httpx @retry( stop=stop_after_attempt(3), wait=wait_exponential_jitter(initial=1, max=10), # backoff + jitter, not a fixed delay retry=retry_if_exception_type((httpx.TimeoutException, httpx.NetworkError)), ) async def fetch_quote(client: httpx.AsyncClient, url: str) -> dict: """Retry only transient failures; let 4xx errors fail immediately.""" resp = await client.get(url, timeout=2.0) resp.raise_for_status() return resp.json() ``` There is a trap inside retries: they are only safe on **idempotent** operations. A GET is naturally idempotent — fetching twice is harmless. But retrying a "charge the customer" call can charge them twice, which is why state-changing operations need an *idempotency key* (covered below) before you are allowed to retry them at all. Retries handle a *blip*. They make things worse for a *sustained* outage: if a service has been down for two minutes, retrying every call three times just triples the load on the dead service and ties up three times as many of your own workers waiting. This is where the **circuit breaker** comes in, and it is the centerpiece pattern. A circuit breaker wraps a downstream call and watches its failure rate. While calls succeed it is *closed* and traffic flows normally. Once failures cross a threshold — say five in a row — it *opens*, and from then on calls fail **instantly**, without even attempting the network, for a cooldown period. After the cooldown it goes *half-open*, letting a single trial call through: if it succeeds the circuit closes; if it fails the circuit opens again. The breaker is what stops the cascade. Instead of a thousand doomed calls piling up against a dead service and exhausting your workers, the breaker trips and those calls return immediately — ideally with a **fallback** (cached data, a default, an empty list) so the user sees graceful degradation rather than an error. In @fig-microservices the Recommendation service is failing, and the open breaker on that edge is why the Order service keeps serving: recommendations come back empty instead of taking checkout down with them. The fourth pattern is the **bulkhead**, named for the watertight compartments that keep one breached section of a ship's hull from flooding the whole vessel. The idea is to *partition your resources per dependency* so one slow downstream can only exhaust its own slice. Give the Recommendation client a pool of at most ten concurrent calls (an `asyncio.Semaphore` is enough), and even if all ten hang, your calls to Payments and Users — in their own pools — are untouched. Without bulkheads a single slow dependency drains a shared connection pool and starves every other call in the process; with them the damage is fenced into one compartment. Together these four patterns — timeout, retry-with-backoff-and-jitter, circuit breaker, bulkhead — are how you make a synchronous edge survivable. ::: {.callout-warning} ## War story: the missing timeout that became a site-wide outage A retailer's product-page service called its recommendations service on every page load — synchronously, to render a "you might also like" carousel, a feature nobody would miss if it vanished. The call had no timeout. One afternoon a slow query made the recommendations service take eight seconds to respond instead of eighty milliseconds. The product service didn't crash; it did something worse. Each incoming request made the eight-second call and held its worker the whole time, so within ninety seconds every worker in the product-service pool was blocked waiting on recommendations. New requests queued, then timed out at the load balancer. The services *in front of* product — checkout, the gateway — started failing too, because their synchronous calls to product now hung. A non-critical carousel had taken down the entire site. The fix was three lines that should have been there from the start: a one-second timeout on the call, a circuit breaker so repeated slow calls stopped being attempted, and a fallback that rendered the page with an empty carousel. The lesson is blunt — **a synchronous call without a timeout is a latent outage**, and the blast radius of a missing timeout is never the one feature you forgot to protect; it is everything upstream of it. ::: > **Build it →** These resilience patterns in a real multi-service stack: > [Project 02: Microservice Platform](https://github.com/jchu0/applied-cs-projects/tree/main/02-microservice-platform) > runs a fleet of gRPC services behind a Kong API gateway — the gateway, routing, and > service-to-service calls of this chapter at production shape — and > [Project 29: Model Routing Layer](https://github.com/jchu0/applied-cs-projects/tree/main/29-model-routing-layer) > is a gateway/routing tier with timeouts, retries, and breakers on flaky downstream > model backends. ## Distributed concerns in brief Three more realities arrive with distribution. Each deserves its own treatment, but you should know their shape so you recognize them before they bite. **Consistency becomes a saga.** Because no transaction can span two services, a business operation touching several of them — create order, reserve payment, reserve inventory — cannot be wrapped in one ACID transaction the way it could in a monolith. The **saga** pattern replaces it with a sequence of local transactions, each with a *compensating* transaction that undoes it if a later step fails: if inventory can't be reserved, you run "release the payment" and "cancel the order" to walk the system back to consistency. Sagas come in two flavors — *choreography* (services react to each other's events, decentralized but hard to trace) and *orchestration* (a central coordinator drives the steps, easier to follow but a single point of control). They are powerful and genuinely complex, which is itself an argument for not splitting until you must: a monolith gets this for free. **Idempotency is mandatory, not optional.** The network can deliver a message zero, one, or many times, and retries actively create duplicates. Every state-changing operation must therefore be safe to apply twice. The standard tool is the **idempotency key**: the caller attaches a unique key, the server records "I already processed this key" with its result, and a duplicate request returns the stored result instead of charging the card again. Without idempotency, the retries you added for resilience become a correctness bug. **Distributed tracing is non-negotiable.** In a monolith, a stack trace tells you the whole story of a request. Across services that story shatters into a dozen log files on a dozen machines, and "why was this one request slow?" becomes unanswerable unless you planned for it. The minimum is a **correlation ID** generated at the gateway and propagated through every downstream call and event, so you can grep one request's entire journey; the mature version is full distributed tracing (OpenTelemetry and the like). The moment you go distributed, observability is the only way you will ever debug the system again — which is why it has a chapter of its own. --- ## Practical exercise **Difficulty:** Level I · Level II · Level III 1. **Level I — Split and find the new failure modes.** Take a small monolith with one endpoint that does two things in-process (e.g. an order endpoint that also looks up the user record). Split it into two services that talk over HTTP, with the order service calling the user service. Get it working, then deliberately make the user service slow (add a `sleep`) and then stop it entirely. Write down every new failure mode the split introduced that did not exist in the monolith — the hang, the partial failure, the dependency on startup order — and note which line of in-process code each one replaced. 2. **Level II — Defend the edge and prove it contains a cascade.** Add a timeout, a retry with exponential backoff and jitter, and a circuit breaker to that flaky downstream call, with a fallback when the circuit is open. Then run a small load test against the order service while the user service is down. Demonstrate, with numbers, that the breaker keeps the order service responsive (failing fast with the fallback) instead of exhausting its workers and going down too. Explain what each of the three defenses contributed and what would have happened with any one of them missing. 3. **Level III — Design a decomposition and defend it.** Given a described system (say, a food-delivery app: users, restaurants, orders, payments, delivery tracking, notifications), produce a decomposition. Draw the service boundaries along domain seams and justify each one against the single-responsibility and own-your-data rules. For every edge, choose synchronous or asynchronous communication and say why. Specify the gateway's responsibilities and a concrete resilience policy (timeouts, retry budgets, breaker thresholds, where bulkheads go). Then make the hardest argument of all: state whether you would actually split this system at all at the given scale, or keep it a modular monolith — and defend your answer with the monolith-first rule. ## Summary Microservices trade the in-process simplicity of a monolith — no partial failure, no network, free transactions — for two specific gains: independent deployability and independent scalability. That trade carries a permanent tax of distributed-systems complexity, so it is an organizational and operational decision, not a default, and a well-structured monolith is the right place to start. When you do split, you draw boundaries along business domains rather than technical layers, give each service its own private database, and let it expose only a published contract — extracting services from seams that production has already proven rather than guessing at them up front. Services talk synchronously when they need an answer now and asynchronously, through events, when they merely need to broadcast that something happened; the choice is really a choice about coupling. An API gateway gives the fleet one front door for routing, auth, and rate limiting. And because the network is hostile, every synchronous edge must be defended — with timeouts, retries that back off and jitter, circuit breakers, and bulkheads — so that one service's failure is contained instead of cascading into the outage that defines the distributed monolith. ### Key takeaways - Microservices buy independent deploy-and-scale at the price of distributed-systems complexity; if you can't name the deploy/scale/team-autonomy win, stay a monolith. - Boundaries are the expensive decision: split along business domains, give each service its own data, and extract from proven seams — never split along technical layers. - Synchronous coupling is coupling in *time*; one slow service can stall everyone who calls it synchronously. Events decouple in time at the cost of eventual consistency. - A synchronous call without a timeout is a latent site-wide outage; the timeout is the precondition for every other resilience pattern. - The circuit breaker is what stops a cascade — it fails fast against a dead dependency instead of letting doomed calls exhaust your workers; pair it with a fallback for graceful degradation. - Distribution forces sagas for consistency, idempotency keys for correctness under retries, and distributed tracing for debuggability — none of them optional once you go distributed. ### Connections to other chapters - **Python: Web Development** (prerequisite): each service in this chapter is exactly one of the web services that chapter teaches you to build — the request lifecycle, validation, and dependency injection are assumed here. This chapter is what happens when you have many of them and they have to cooperate over a network. - **Python: Design Patterns** (prerequisite): the Dependency Rule and the discipline of seams are what make a service *extractable* in the first place. A module hidden behind a clean interface can be lifted out into a service; a tangle of cross-references cannot. Good boundaries in the monolith are the raw material for good services. - **Orchestration with Kubernetes** (extension): this chapter is about *designing* a fleet of services; Kubernetes is how a fleet is actually *run* — scheduling each service's containers across many hosts, restarting the ones that die, rolling out new versions, and providing the service discovery that lets `user-service:8000` resolve at all. The resilience patterns here and the self-healing there are complementary layers of the same goal. - **Observability** (extension): the correlation IDs and distributed tracing mentioned in passing here are a discipline in their own right. The moment you go distributed, debugging *requires* tracing — a single stack trace no longer exists — which is why observability is non-optional for any real microservice system, and gets its own treatment. ## Further reading ### Essential - Sam Newman, *Building Microservices* (2nd ed., O'Reilly) — the canonical, balanced treatment of boundaries, communication, and the organizational realities of splitting. - Martin Fowler, *"MonolithFirst"* — the short, sharp argument for starting with a monolith and extracting services from proven seams, and the failure mode of premature decomposition. ### Deep dives - Michael Nygard, *Release It!* (2nd ed.) — the source text for the resilience patterns in this chapter: timeouts, circuit breakers, bulkheads, and the failure modes they defend against, with hard-won production stories. - Martin Fowler, *"Microservices"* — the foundational article defining the style, its characteristics, and its tradeoffs against the monolith. ### Historical context - Garcia-Molina and Salem, *"Sagas"* (SIGMOD, 1987) — the original paper that introduced the saga as an alternative to long-lived distributed transactions, decades before microservices made it standard practice.