Python: Web Development

Keywords

fastapi, asgi, async, pydantic, dependency injection, rest api, validation, uvicorn, web framework, request lifecycle

Introduction

The service was a textbook Flask app — a few dozen routes, a Postgres database, synchronous Gunicorn workers behind nginx. It served an internal dashboard fine for two years. Then the product team wired it to a third-party billing API, and every page that rendered a customer’s invoices now made an outbound HTTP call that took, on a good day, four hundred milliseconds. Nobody thought much of it; four hundred milliseconds is fast.

The trouble showed up the first morning the sales team logged in together. The app ran sixteen synchronous workers, and each worker does exactly one thing at a time. When a worker picks up a request that calls the billing API, it sits there — holding the worker, doing nothing — for the full round trip. With sixteen workers, the service could handle about forty such requests per second before every worker was blocked, parked on a socket. The CPUs were idle, the database was bored, and yet new requests queued behind a wall of workers all asleep on the network, and the dashboard started returning 502s. The bottleneck wasn’t compute or the database; it was that a synchronous worker blocked on I/O is a worker doing nothing while pretending to be busy.

The same team had a second, quieter problem. Every endpoint hand-parsed its own request body — request.get_json(), a thicket of if "email" not in data checks, manual coercion of strings to integers. Half the 500s in the logs weren’t bugs in the business logic; they were a missing field or a string where an integer was expected, blowing up three layers deep instead of being rejected at the door. The validation was real code, untested, duplicated across forty handlers, subtly different in each.

Both problems share a root and a fix. A web framework’s actual job is to own the request lifecycle — turn a raw HTTP request into a typed Python call, run it without wasting a thread waiting on the network, validate what comes in and serialize what goes out — so you are left writing the part that is genuinely yours: the domain logic. The team’s pain was the framework’s job leaking into application code. This chapter is about doing that job well, for a single service. (When one service becomes many — service boundaries, inter-service calls, gateways — that is the sibling Python: Microservices chapter, and we point there rather than duplicate it.)

The Core Insight

A modern Python web service is built on two ideas that compose: ASGI and async. WSGI, the older standard that Flask and classic Django speak, is synchronous by construction — it hands a worker one request, the worker runs your handler start to finish, and only then is the worker free for the next request. That model is fine when handlers are CPU-bound and fast. It falls apart exactly when a handler waits on I/O, because the waiting is invisible to the worker pool: a blocked worker looks identical to a busy one.

ASGI — the Asynchronous Server Gateway Interface — is the async successor. It lets a single process run an event loop that juggles thousands of in-flight requests on one thread. When a handler hits an await on a network call, it doesn’t block the thread; it yields control back to the loop, which runs some other request’s code until the first one’s I/O completes. The four-hundred-millisecond billing call is no longer four hundred milliseconds of a worker doing nothing — it’s four hundred milliseconds during which the loop serves hundreds of other requests. For I/O-bound work, which is most web work, this changes the throughput ceiling by an order of magnitude on the same hardware.

The second half of the insight is that the framework hands validation and serialization to the type system. Instead of hand-parsing JSON, you declare a Pydantic model describing a valid request, and the framework parses the body into it — or rejects it with a precise 422 before your code runs. The same trick runs in reverse on the way out: you declare a response model, and the framework guarantees the response matches it, stripping any field you didn’t promise. The forty hand-written validators collapse into a handful of typed declarations. You write handlers; the framework owns the lifecycle.

A mental model

Think of the request lifecycle as an assembly line. A raw HTTP request enters at one end as unstructured bytes on a socket and leaves the other end as a structured response, passing through a fixed sequence of stations, each doing one job and handing the work along. The ASGI server is the intake — it pulls bytes off the wire and turns them into a request the rest of the line understands. The middleware chain wraps every request: add CORS headers, compress, log, check auth. Routing is the switch that sends the request to the right station for its path and method. Dependency injection is the parts feeder — it gathers what the handler needs (a database session, the authenticated user, configuration) and sets it beside the work. The handler is the one station that is yours: it does the domain logic and produces a result. Then the line runs in reverse — the result is serialized and threaded back out through the same middleware to the client.

The crucial detail the analogy captures: the framework is the conveyor and the stations are mostly off-the-shelf; your handler is one station on a long line you didn’t build. And because the line runs on an event loop, a station that has to wait — your handler await-ing the database — doesn’t stop the conveyor. The loop simply runs another request’s line while yours waits, which is why one process keeps thousands of requests moving at once. Figure 10.1 traces a single request down the line and back.

When to use what

Three frameworks dominate Python web work, and the choice is about the shape of service rather than which is “best.” Reach for FastAPI when you are building an API-first, async service — a JSON backend for a single-page app or mobile client, an ML model server, anything that spends its time on I/O. It is ASGI-native, async by default, and validation-by-types is its whole design; if your service is the shape this chapter describes, this is the default, and the code here is FastAPI. Reach for Django when you want batteries included and a server-rendered or admin-heavy app: a built-in ORM, a production-grade admin, sessions, forms, and auth out of the box. (Django can do async since 4.x, but its center of gravity is the synchronous full-stack world.) Reach for Flask when the service is small and synchronous and you value seeing every moving part: a prototype, a tiny internal endpoint, a glue service — Flask is WSGI and minimal by design, which is exactly why the opening story’s service hit a wall the moment it became I/O-bound under concurrency. The rest of this chapter assumes the first answer.

What you’ll learn

How an HTTP request travels the ASGI lifecycle — server, middleware, routing, dependency injection, handler, response — and where your code actually sits
Why async lets one process serve thousands of concurrent I/O-bound requests, and the one rule that breaks it: never block the event loop
How Pydantic turns request and response boundaries into typed contracts, and why “parse, don’t validate” makes your handler simpler and safer
How Depends injects auth, sessions, and config without globals, and why that makes handlers testable
The security basics of a single service: token auth, password hashing, and CORS
How to serve the app in production: gunicorn/uvicorn workers, the two axes of scaling, and graceful startup and shutdown

Prerequisites

Python: Language Features — decorators, type hints, and context managers; FastAPI and Pydantic are driven by annotations, and the lifespan manager is a context manager
Concurrency and Parallelism Models — async/await, coroutines, and the asyncio event loop, the engine this chapter runs on (covered there comparatively across all six languages)
Working knowledge of HTTP: methods, status codes, headers, and JSON bodies

The ASGI request lifecycle

Everything else in this chapter is a station on the line, so it pays to walk the line once, end to end. The diagram below is the map; the prose that follows is the tour.

At the top of the line sits the ASGI server — in practice, uvicorn. It owns the socket: it accepts the TCP connection, parses the raw HTTP bytes, and translates them into the ASGI protocol, a small contract of three objects (a scope dict describing the request, plus receive and send callables for streaming the body and response). Critically, uvicorn runs an asyncio event loop in that one process — often on uvloop, a fast libuv-based loop — and it is this loop that lets the single process keep many requests in flight. Your application never speaks raw HTTP; it speaks ASGI to uvicorn, and uvicorn speaks HTTP to the world.

Below uvicorn, the request enters the middleware chain. Middleware wraps every request and response: it runs on the way in, before routing, and again on the way out, after the handler. This is where cross-cutting concerns live — CORS headers, GZip compression, request logging, broad auth checks — precisely the things you do not want scattered across forty handlers. FastAPI is built on Starlette, so the middleware is ASGI middleware underneath. Order matters: middleware added first wraps the outside, seeing the request earliest and the response latest — and, as the war story notes, configure the full stack before the app starts serving rather than after mounting routes.

Past middleware, the router matches the request’s path and method to exactly one handler. GET /users/42 and POST /users are different routes even though they share a prefix; the router resolves the path parameters (42 becomes user_id: int) and selects the function to run. If nothing matches, the router itself produces the 404 — your code never sees it.

Just before the handler runs, FastAPI resolves its dependencies — the parts feeder (a full section follows). It inspects the handler’s signature, sees what it needs (a database session, the current user, config), runs the callables that produce those values, and injects them as arguments. Then, and only then, the handler runs: it receives a fully typed, validated request and the resources it asked for, does the domain work — often await-ing the database or an external service — and returns a result. That result is serialized through the response model and threaded back up through the same middleware to uvicorn, which writes the HTTP response to the socket. One trip down the line, one trip back.

Async views and the “don’t block the loop” rule

The reason this lifecycle scales is that the handler runs on the event loop, and the loop’s defining property is that it never waits. When your handler hits await db.execute(...), the coroutine yields — “I have nothing to do until this I/O finishes, go run someone else.” The loop picks another request that does have work, runs it until it awaits, and so on. I/O wait is free; the loop fills it with other people’s work. This is cooperative multitasking, covered comparatively in the Concurrency and Parallelism Models chapter, and FastAPI is one long application of it.

There is exactly one way to ruin it, and it is the most common production mistake in async Python: blocking the loop. A coroutine yields control only at an await. If you make a synchronous blocking call inside an async def handler — a time.sleep(2), a requests.get(), a CPU-bound loop, a synchronous database driver — the loop cannot take control back, because you never gave it an await to take it at. For the full duration of that call, the whole worker is frozen — not just your request, but every concurrent request that worker was juggling. One blocking call serializes the entire process, and the symptom is maddening: latency spikes under load with the CPU near idle.

The fix has two parts. First, inside async def handlers, use async libraries for I/O: await asyncio.sleep(), httpx.AsyncClient, an async driver (asyncpg) with async SQLAlchemy. Second — the escape hatch — if you must call something blocking (no async version, or a genuinely CPU-bound step), push it onto a threadpool with run_in_threadpool, or simply declare the route as a plain def: FastAPI runs def handlers in a threadpool, off the loop, so they can’t freeze it. The rule of thumb: make a route async def only if it actually awaits something; if it does blocking work with no async option, make it def and let the framework move it off the loop.

import asyncio
import httpx
from fastapi import FastAPI

app = FastAPI()

@app.get("/quote")
async def get_quote() -> dict[str, float]:
    """Correct: an async client, awaited — the loop runs others while we wait."""
    async with httpx.AsyncClient(timeout=5.0) as client:
        resp = await client.get("https://example.com/price")  # yields here
    return {"price": resp.json()["price"]}

The handler above looks ordinary, but the await on the HTTP call is what keeps the process responsive: during that network round trip, this coroutine is parked and the loop is busy serving other requests. Swap httpx.AsyncClient for synchronous requests.get() and the line goes quiet — that one substitution is the difference between the service from the introduction and the service that replaced it.

Pydantic validation: typed boundaries

A web service sits at a trust boundary. Everything from the network is untrusted bytes; everything inside your handler should be trusted, typed Python. The job of the request boundary is to convert the former into the latter exactly once, at the edge, so the interior never wonders whether user_id is really an integer or email is really present. Pydantic is how FastAPI does this. You declare a model — a class describing fields, types, and constraints — and FastAPI uses it to parse the incoming JSON into a typed object. If the JSON doesn’t fit, the request is rejected with a 422 and a precise per-field error report, and your handler never runs.

This is the “parse, don’t validate” principle, and the distinction is sharp. Validation checks a value and leaves you holding the same untyped data, hoping you remember it passed. Parsing transforms it into a type that cannot be malformed — once you hold a UserCreate, the type system guarantees email is a valid email and password is at least eight characters, because an object violating those couldn’t have been constructed. The check happens once, at the door; the rest of the code is freed from defensive re-checking.

from pydantic import BaseModel, EmailStr, Field

class UserCreate(BaseModel):
    """The request contract: what a valid 'create user' body must contain."""
    email: EmailStr                                  # rejected at the door if malformed
    username: str = Field(min_length=3, max_length=50)
    password: str = Field(min_length=8)              # never appears in any response model

class UserOut(BaseModel):
    """The response contract: exactly what we promise to send back."""
    id: int
    email: EmailStr
    username: str
    model_config = {"from_attributes": True}         # build directly from an ORM object

The two models above are deliberately different, and that difference is the most important habit in the chapter. UserCreate has a password; UserOut does not. When a handler declares UserCreate as its body and UserOut as its response_model, FastAPI parses the request into the first and serializes the response through the second — so the password hash, internal flags, and any field not named in UserOut are physically incapable of reaching the client, even if you return a full ORM object. Declaring response_model is not optional politeness; it is the wall that stops internal fields from leaking over the wire. The endpoint that ties them together is small precisely because the models carry the weight:

from fastapi import APIRouter, HTTPException, status

router = APIRouter()

@router.post("/users", response_model=UserOut, status_code=status.HTTP_201_CREATED)
async def create_user(body: UserCreate) -> UserOut:
    """`body` is already validated; the return is filtered through UserOut."""
    if await users.email_exists(body.email):
        raise HTTPException(status.HTTP_409_CONFLICT, "Email already registered")
    return await users.create(body)   # an ORM object; UserOut strips it to the contract

There is no if "email" not in data, no manual type coercion, no hand-written 422 response. The forty fragile validators from the introduction became two model classes and a response_model= argument, and as a bonus FastAPI generated an OpenAPI schema and interactive Swagger docs from those same declarations — the types are simultaneously the validator, the serializer, and the documentation.

Build it → See typed boundaries at production scale: the FastAPI services in Project 05: SaaS Web Platform and Project 50: Feature Engineering Platform use Pydantic request/response models and dependency-injected sessions throughout.

Dependency injection with `Depends`

The thing that most distinguishes FastAPI from Flask is how the handler gets its resources. A handler usually needs a database session, the authenticated user, and some configuration. The naive approach reaches for globals and pays for it in untestability and lifecycle bugs. FastAPI’s answer is dependency injection: the handler declares what it needs as typed parameters wrapped in Depends, and the framework produces those values per request and passes them in.

A dependency is just a callable. FastAPI inspects the handler’s signature at startup, sees each Depends(...), and builds a dependency graph — because dependencies can themselves depend on other dependencies. The classic chain is get_current_user depending on get_db (to look the user up) and on the bearer token (to identify them). FastAPI resolves the whole graph per request, caching any dependency used more than once within a request so it runs only once. The result is that a handler’s signature reads as a precise statement of its needs, each one supplied by the framework rather than a global.

The most important pattern is the request-scoped resource — above all, the database session. A dependency that yields gives you setup-and-teardown around the request: the code before yield runs before the handler, the value is injected, and the code after yield runs once the response is sent, guaranteeing cleanup even if the handler raised.

from typing import Annotated
from fastapi import Depends
from sqlalchemy.ext.asyncio import AsyncSession

async def get_db() -> AsyncGenerator[AsyncSession, None]:
    """One session per request; closed automatically when the request ends."""
    async with async_session_factory() as session:
        yield session                # injected into the handler
        # after the response is sent, the context manager closes the session

DbSession = Annotated[AsyncSession, Depends(get_db)]   # a reusable, named dependency

async def get_current_user(token: Annotated[str, Depends(oauth2_scheme)],
                           db: DbSession) -> User:
    """Depends on both the token and the DB — FastAPI wires the graph."""
    user = await users.from_token(db, token)
    if user is None:
        raise HTTPException(status.HTTP_401_UNAUTHORIZED, "Invalid credentials")
    return user

The payoff is twofold. In production, every request gets a fresh session closed exactly once, sidestepping a whole genus of pooling and stale-connection bugs — provided you obey one rule: a service or handler must never store the session as an attribute. Store one on a long-lived object and the next request reuses a closed session, producing the infamous Event loop is closed errors. The second payoff is testing: because the handler asks for get_db rather than a global, a test can override that one dependency with a transactional test session and exercise the real handler against a real-but-disposable database — clean architecture in the small, with the framework enforcing the handler-to-domain seam. (That seam is the subject of the Python: Design Patterns chapter.)

Auth and security for one service

Authentication for a single service has three moving parts, and FastAPI’s dependency system makes each a clean unit. The first is password hashing. You never store passwords; you store a slow, salted hash and compare hashes on login. Use a purpose-built password hash — bcrypt via passlib is the standard — so that even a breach doesn’t hand over plaintext, and so the hash is deliberately slow enough to make brute force impractical. Hashing on signup and verifying on login are two small functions, and they are the entire defense between a leaked table and a credential-stuffing disaster.

The second is tokens. After verifying a password at a login endpoint, the service issues a token — a JWT is common — that the client sends on later requests in the Authorization: Bearer header. The token is signed with a server-side secret, so the service can verify it without a database lookup, and it carries an expiry. The pattern that earns its keep is short-lived access tokens (15–30 minutes) paired with longer-lived refresh tokens: if an access token leaks, its window is small, and the client uses the refresh token to mint a new one without re-entering a password. Token verification then becomes a dependency — get_current_user from above — so every protected endpoint gets auth simply by declaring it needs the current user.

from passlib.context import CryptContext

pwd = CryptContext(schemes=["bcrypt"], deprecated="auto")

def hash_password(plaintext: str) -> str:
    return pwd.hash(plaintext)            # slow, salted — store this, never the password

def verify_password(plaintext: str, hashed: str) -> bool:
    return pwd.verify(plaintext, hashed)  # constant-time comparison against the stored hash

The third is CORS. Browsers enforce the same-origin policy: a single-page app served from app.example.com cannot, by default, call an API at api.example.com. CORS is the server’s way of declaring which other origins may call it, configured as middleware with an explicit allow-list. The mistake to avoid is the wildcard allow_origins=["*"] combined with credentials — a footgun that browsers reject anyway; name the origins you actually trust. Beyond these three, the validation from the Pydantic section is itself a security control: parsing every request at the boundary is how you stop malformed and malicious input from reaching your logic. Everything here is single-service auth; cross-service identity, token propagation, and gateway-level auth belong to the Python: Microservices chapter.

War story: the synchronous call that serialized a service

A team shipped a FastAPI service that enriched every user-profile request with data from an internal scoring API. It passed every test and ran beautifully in staging, where one engineer clicked around at a time. In production it fell over within the hour: p99 latency climbed into the tens of seconds while CPU sat at 12%. The handler was async def, so it looked async — but inside it, someone had reached for the familiar synchronous requests.get(). Every time that line ran, it blocked the event loop for the full round trip, freezing not just that request but every other request the worker was juggling; concurrency made it worse, not better. The fix was one line — requests.get() became await client.get() with httpx.AsyncClient — and p99 dropped back under 50ms. The lesson has two halves that always travel together: an async def handler is a promise that you will not block the loop, and a single synchronous I/O call inside one breaks that promise for the whole process. When you can’t avoid a blocking call, make the route def (FastAPI runs it in a threadpool) rather than lying with async def. The same discipline applies to skipping response_model: returning a raw ORM object “to save a line” is how an internal password_hash ends up in a JSON response and then in a client’s logs — the typed response boundary exists precisely to make that leak impossible.

Serving in production

A development server is uvicorn app.main:app --reload — one process, hot reload, fine for one developer. Production differs in three ways: multiple processes, no reload, and clean lifecycle handling. The standard setup runs gunicorn as a process manager with uvicorn workers: gunicorn supervises a pool of workers, restarts any that die, and handles signals; each worker is a uvicorn instance running its own event loop. This gives you the two orthogonal axes of scaling, and keeping them straight is the whole game. More CPU cores → more gunicorn workers, because a single Python process is bound by the GIL to one core’s CPU. More concurrent I/O → more coroutines per worker, which the event loop gives you for free. A common starting point is one worker per core (sometimes 2 × cores + 1), each handling thousands of concurrent connections.

# Production: gunicorn supervises N uvicorn workers (one event loop each).
CMD ["gunicorn", "app.main:app", \
     "-k", "uvicorn.workers.UvicornWorker", \
     "--bind", "0.0.0.0:8000", "--workers", "4"]

The other half of production-readiness is graceful startup and shutdown, and FastAPI gives you one place for both: the lifespan context manager. The code before yield runs once at startup — open the connection pool, connect to the cache, verify dependencies — and the code after yield runs once at shutdown. On startup, open expensive resources (a connection pool) exactly once and reuse them across requests; opening a pool per request is a classic pool-exhaustion outage. Mark the app ready only after initialization completes, which is why production services split a liveness probe (“is the process alive?”) from a readiness probe (“is it ready to serve?”) — the load balancer routes traffic only when readiness passes, so requests don’t hit a half-initialized app.

from contextlib import asynccontextmanager
from fastapi import FastAPI

@asynccontextmanager
async def lifespan(app: FastAPI):
    app.state.pool = await create_pool(settings.dsn)   # startup: open the pool once
    yield                                              # app serves requests here
    await app.state.pool.close()                       # shutdown: drain connections cleanly

app = FastAPI(lifespan=lifespan)

On shutdown, the ordering is reversed and the goal is to drain in flight. When the orchestrator sends SIGTERM, you want a brief grace period during which the load balancer stops sending new traffic, then you let in-flight requests finish, then you close resources in reverse order of how you opened them. Skip the grace period and you reset connections mid-request — ConnectionResetError for whoever was unlucky. A correct lifespan, a sensible worker count, and split health probes are most of what separates a service that survives a deploy from one that drops requests on every rollout.

Build it → Full production FastAPI services with lifespan-managed resources and worker configuration: Project 24: Synthetic Data Generator serves an async API, and Project 05: SaaS Web Platform wires auth, pooled sessions, and graceful startup/shutdown end to end.

This is the edge of single-service territory. When one service grows into many — a gateway in front of several services, retries and circuit breakers between them, distributed tracing across a call graph — you have crossed into the Python: Microservices chapter. The craft here, doing one service well, is the prerequisite for that one: a microservice is, first, a web service that gets the lifecycle right.

Practical exercise

Difficulty: Level I · Level II · Level III

Level I — A typed endpoint. Build a small FastAPI service with one resource (say, notes). Define a Pydantic request model with a constrained field (a non-empty title, a bounded-length body) and a separate response model that omits an internal field (owner_id). Implement POST and GET /{id} over an in-memory dict. Send a malformed request and confirm a 422 with a per-field error you never wrote, and confirm the omitted field never appears in a response. In a sentence, explain the difference between the handler’s return type hint and its response_model.
Level II — Inject auth and a session. Add a login endpoint that hashes passwords with bcrypt and issues a signed token, and a get_current_user dependency that validates it. Replace the in-memory store with a real database behind a get_db dependency that yields a request-scoped session, and protect the write endpoints. Then trace one request end to end: what uvicorn does, what middleware runs, how the router picks the handler, the order in which get_current_user and get_db resolve, and where the response model filters the output.
Level III — Make it production-correct. Audit every handler for blocking calls inside async def and fix them (async client, async driver, or demote to def). Prove it with a load test: fire a few hundred concurrent requests at an I/O-bound endpoint and show throughput scales, then reintroduce a time.sleep() and watch it collapse with the CPU idle. Configure gunicorn with uvicorn workers sized to your cores, add a lifespan that opens a pooled resource once and drains it on shutdown, and split liveness from readiness. Finally, write a short paragraph on where this single service stops being enough — what specific need (a second service, inter-service auth, a gateway) would send you to the Python: Microservices chapter, and why those patterns don’t belong here.

Summary

A modern Python web service is a single process running an asyncio event loop on top of ASGI, and a good framework’s job is to own the request lifecycle so you can own the domain. uvicorn turns HTTP bytes into typed Python calls; middleware handles cross-cutting concerns; the router selects a handler; dependency injection supplies its resources; the async handler runs on the loop and calls your domain; and the response is serialized back out. Because the handler runs on the loop, one process serves thousands of concurrent I/O-bound requests — provided you never block it with a synchronous call. Pydantic turns the boundaries into typed contracts, parsing untrusted input once at the door and guaranteeing the output matches what you promised. Depends injects sessions, auth, and config without globals, keeping handlers testable. And production serving is gunicorn-supervised uvicorn workers plus a lifespan that opens resources once and drains them cleanly — the foundation the Microservices chapter builds on when a service multiplies.

Key takeaways

The framework’s job is the request lifecycle — server, middleware, routing, DI, handler, response — so your job is just the handler’s domain logic.
ASGI + async lets one process serve thousands of concurrent I/O-bound requests; the one rule that breaks it is blocking the loop with a synchronous call inside async def.
Make a route async def only if it awaits; if it must do blocking work with no async option, make it def and let FastAPI run it in a threadpool.
Pydantic models are typed boundaries: parse-don’t-validate at the request edge, and a declared response_model is the wall that stops internal fields from leaking.
Depends injects request-scoped resources (DB sessions, current user) without globals; never store a session as an attribute, and override dependencies to test handlers.
Production = gunicorn + uvicorn workers (cores → workers, I/O → coroutines) plus a lifespan that opens resources once and drains them on SIGTERM.

Connections to other chapters

Concurrency and Parallelism Models (prerequisite): the asyncio event loop is the engine this chapter runs on. The “don’t block the loop” rule, cooperative multitasking, and the difference between await asyncio.sleep() and time.sleep() are that chapter’s material — set in its comparative treatment of how six languages model concurrency — applied directly to handling HTTP requests at scale.
Python: Microservices (sibling): this chapter is one service done well; that one is what happens when one becomes many — service decomposition, inter-service communication, gateways, retries, circuit breakers. A microservice is first a web service that gets the lifecycle right, so this chapter is its foundation, not its rival.
Python: Design Patterns (extension): dependency injection and the handler-to-domain seam are clean-architecture patterns in the small. Keeping handlers thin, pushing logic into an injected service layer, and depending on abstractions rather than globals is what that chapter formalizes.
Containerization with Docker (Part V, extension): how this service ships. A FastAPI app is the canonical interpreter-plus-wheels image — a slim base, the dependency tree, a gunicorn entrypoint — and the multi-stage build and non-root patterns there turn the service you built here into the immutable artifact a platform runs.

FastAPI documentation — the official tutorial and user guide; the canonical reference for Depends, response models, lifespan, and OpenAPI generation.
Pydantic documentation (v2) — models, validators, and the model_validate/model_dump API that drives every typed boundary in this chapter.

Deep dives

The ASGI specification — the small, precise contract (scope, receive, send) that every async Python web server and framework implements; reading it demystifies what uvicorn actually hands your app.
Starlette documentation — the ASGI toolkit underneath FastAPI; middleware, routing, and the test client all originate here, and understanding it explains why FastAPI behaves as it does.

Historical context

PEP 3333 — Python Web Server Gateway Interface (WSGI) and the WSGI→ASGI evolution — the synchronous standard that defined Python web development for a decade, and the reasoning behind the async successor that made the event-loop model in this chapter possible.

--- title: "Python: Web Development" keywords: [fastapi, asgi, async, pydantic, dependency injection, rest api, validation, uvicorn, web framework, request lifecycle] difficulty: intermediate prerequisites: [python-language-features, concurrency-models] estimated_time: "3-4 hours" --- ## Introduction The service was a textbook Flask app — a few dozen routes, a Postgres database, synchronous Gunicorn workers behind nginx. It served an internal dashboard fine for two years. Then the product team wired it to a third-party billing API, and every page that rendered a customer's invoices now made an outbound HTTP call that took, on a good day, four hundred milliseconds. Nobody thought much of it; four hundred milliseconds is fast. The trouble showed up the first morning the sales team logged in together. The app ran sixteen synchronous workers, and each worker does exactly one thing at a time. When a worker picks up a request that calls the billing API, it sits there — holding the worker, doing nothing — for the full round trip. With sixteen workers, the service could handle about forty such requests per second before every worker was blocked, parked on a socket. The CPUs were idle, the database was bored, and yet new requests queued behind a wall of workers all asleep on the network, and the dashboard started returning 502s. The bottleneck wasn't compute or the database; it was that a synchronous worker blocked on I/O is a worker doing nothing while pretending to be busy. The same team had a second, quieter problem. Every endpoint hand-parsed its own request body — `request.get_json()`, a thicket of `if "email" not in data` checks, manual coercion of strings to integers. Half the 500s in the logs weren't bugs in the business logic; they were a missing field or a string where an integer was expected, blowing up three layers deep instead of being rejected at the door. The validation was real code, untested, duplicated across forty handlers, subtly different in each. Both problems share a root and a fix. A web framework's actual job is to **own the request lifecycle** — turn a raw HTTP request into a typed Python call, run it without wasting a thread waiting on the network, validate what comes in and serialize what goes out — so you are left writing the part that is genuinely yours: the domain logic. The team's pain was the framework's job leaking into application code. This chapter is about doing that job well, for a single service. (When one service becomes many — service boundaries, inter-service calls, gateways — that is the sibling *Python: Microservices* chapter, and we point there rather than duplicate it.) ### The Core Insight A modern Python web service is built on two ideas that compose: **ASGI** and **async**. WSGI, the older standard that Flask and classic Django speak, is synchronous by construction — it hands a worker one request, the worker runs your handler start to finish, and only then is the worker free for the next request. That model is fine when handlers are CPU-bound and fast. It falls apart exactly when a handler waits on I/O, because the waiting is invisible to the worker pool: a blocked worker looks identical to a busy one. ASGI — the Asynchronous Server Gateway Interface — is the async successor. It lets a single process run an **event loop** that juggles thousands of in-flight requests on one thread. When a handler hits an `await` on a network call, it doesn't block the thread; it *yields* control back to the loop, which runs some other request's code until the first one's I/O completes. The four-hundred-millisecond billing call is no longer four hundred milliseconds of a worker doing nothing — it's four hundred milliseconds during which the loop serves hundreds of other requests. For I/O-bound work, which is most web work, this changes the throughput ceiling by an order of magnitude on the same hardware. The second half of the insight is that the framework hands **validation and serialization to the type system**. Instead of hand-parsing JSON, you declare a Pydantic model describing a valid request, and the framework parses the body into it — or rejects it with a precise 422 before your code runs. The same trick runs in reverse on the way out: you declare a response model, and the framework guarantees the response matches it, stripping any field you didn't promise. The forty hand-written validators collapse into a handful of typed declarations. You write handlers; the framework owns the lifecycle. ### A mental model Think of the request lifecycle as an **assembly line**. A raw HTTP request enters at one end as unstructured bytes on a socket and leaves the other end as a structured response, passing through a fixed sequence of stations, each doing one job and handing the work along. The **ASGI server** is the intake — it pulls bytes off the wire and turns them into a request the rest of the line understands. The **middleware chain** wraps every request: add CORS headers, compress, log, check auth. **Routing** is the switch that sends the request to the right station for its path and method. **Dependency injection** is the parts feeder — it gathers what the handler needs (a database session, the authenticated user, configuration) and sets it beside the work. The **handler** is the one station that is *yours*: it does the domain logic and produces a result. Then the line runs in reverse — the result is serialized and threaded back out through the same middleware to the client. The crucial detail the analogy captures: the framework is the conveyor and the stations are mostly off-the-shelf; your handler is one station on a long line you didn't build. And because the line runs on an event loop, a station that has to wait — your handler `await`-ing the database — doesn't stop the conveyor. The loop simply runs another request's line while yours waits, which is why one process keeps thousands of requests moving at once. @fig-request-lifecycle traces a single request down the line and back. ### When to use what Three frameworks dominate Python web work, and the choice is about the shape of service rather than which is "best." **Reach for FastAPI** when you are building an API-first, async service — a JSON backend for a single-page app or mobile client, an ML model server, anything that spends its time on I/O. It is ASGI-native, async by default, and validation-by-types is its whole design; if your service is the shape this chapter describes, this is the default, and the code here is FastAPI. **Reach for Django** when you want batteries included and a server-rendered or admin-heavy app: a built-in ORM, a production-grade admin, sessions, forms, and auth out of the box. (Django can do async since 4.x, but its center of gravity is the synchronous full-stack world.) **Reach for Flask** when the service is small and synchronous and you value seeing every moving part: a prototype, a tiny internal endpoint, a glue service — Flask is WSGI and minimal by design, which is exactly why the opening story's service hit a wall the moment it became I/O-bound under concurrency. The rest of this chapter assumes the first answer. ### What you'll learn - How an HTTP request travels the ASGI lifecycle — server, middleware, routing, dependency injection, handler, response — and where your code actually sits - Why async lets one process serve thousands of concurrent I/O-bound requests, and the one rule that breaks it: never block the event loop - How Pydantic turns request and response boundaries into typed contracts, and why "parse, don't validate" makes your handler simpler and safer - How `Depends` injects auth, sessions, and config without globals, and why that makes handlers testable - The security basics of a single service: token auth, password hashing, and CORS - How to serve the app in production: gunicorn/uvicorn workers, the two axes of scaling, and graceful startup and shutdown ### Prerequisites - **Python: Language Features** — decorators, type hints, and context managers; FastAPI and Pydantic are driven by annotations, and the lifespan manager is a context manager - **Concurrency and Parallelism Models** — `async`/`await`, coroutines, and the asyncio event loop, the engine this chapter runs on (covered there comparatively across all six languages) - Working knowledge of HTTP: methods, status codes, headers, and JSON bodies --- ## The ASGI request lifecycle Everything else in this chapter is a station on the line, so it pays to walk the line once, end to end. The diagram below is the map; the prose that follows is the tour. ![The ASGI request lifecycle: uvicorn accepts the request, it passes through middleware and routing, dependencies are resolved and injected, the async handler runs on the event loop and calls the domain, and the response flows back out — letting one process serve many concurrent I/O-bound requests.](../assets/diagrams/rendered/py_request_lifecycle.svg){#fig-request-lifecycle .lightbox} At the top of the line sits the **ASGI server** — in practice, **uvicorn**. It owns the socket: it accepts the TCP connection, parses the raw HTTP bytes, and translates them into the ASGI protocol, a small contract of three objects (a `scope` dict describing the request, plus `receive` and `send` callables for streaming the body and response). Critically, uvicorn runs an **asyncio event loop** in that one process — often on uvloop, a fast libuv-based loop — and it is this loop that lets the single process keep many requests in flight. Your application never speaks raw HTTP; it speaks ASGI to uvicorn, and uvicorn speaks HTTP to the world. Below uvicorn, the request enters the **middleware chain**. Middleware wraps every request and response: it runs on the way in, before routing, and again on the way out, after the handler. This is where cross-cutting concerns live — CORS headers, GZip compression, request logging, broad auth checks — precisely the things you do not want scattered across forty handlers. FastAPI is built on Starlette, so the middleware is ASGI middleware underneath. Order matters: middleware added first wraps the outside, seeing the request earliest and the response latest — and, as the war story notes, configure the full stack before the app starts serving rather than after mounting routes. Past middleware, the **router** matches the request's path and method to exactly one handler. `GET /users/42` and `POST /users` are different routes even though they share a prefix; the router resolves the path parameters (`42` becomes `user_id: int`) and selects the function to run. If nothing matches, the router itself produces the 404 — your code never sees it. Just before the handler runs, FastAPI resolves its **dependencies** — the parts feeder (a full section follows). It inspects the handler's signature, sees what it needs (a database session, the current user, config), runs the callables that produce those values, and injects them as arguments. Then, and only then, the **handler** runs: it receives a fully typed, validated request and the resources it asked for, does the domain work — often `await`-ing the database or an external service — and returns a result. That result is serialized through the response model and threaded back up through the same middleware to uvicorn, which writes the HTTP response to the socket. One trip down the line, one trip back. ## Async views and the "don't block the loop" rule The reason this lifecycle scales is that the handler runs on the event loop, and the loop's defining property is that it never waits. When your handler hits `await db.execute(...)`, the coroutine yields — "I have nothing to do until this I/O finishes, go run someone else." The loop picks another request that *does* have work, runs it until *it* awaits, and so on. I/O wait is free; the loop fills it with other people's work. This is cooperative multitasking, covered comparatively in the **Concurrency and Parallelism Models** chapter, and FastAPI is one long application of it. There is exactly one way to ruin it, and it is the most common production mistake in async Python: **blocking the loop**. A coroutine yields control only at an `await`. If you make a *synchronous* blocking call inside an `async def` handler — a `time.sleep(2)`, a `requests.get()`, a CPU-bound loop, a synchronous database driver — the loop cannot take control back, because you never gave it an `await` to take it at. For the full duration of that call, the whole worker is frozen — not just your request, but *every* concurrent request that worker was juggling. One blocking call serializes the entire process, and the symptom is maddening: latency spikes under load with the CPU near idle. The fix has two parts. First, inside `async def` handlers, use async libraries for I/O: `await asyncio.sleep()`, `httpx.AsyncClient`, an async driver (`asyncpg`) with async SQLAlchemy. Second — the escape hatch — if you *must* call something blocking (no async version, or a genuinely CPU-bound step), push it onto a threadpool with `run_in_threadpool`, or simply declare the route as a plain `def`: FastAPI runs `def` handlers in a threadpool, off the loop, so they can't freeze it. The rule of thumb: **make a route `async def` only if it actually `await`s something; if it does blocking work with no async option, make it `def` and let the framework move it off the loop.** ```python import asyncio import httpx from fastapi import FastAPI app = FastAPI() @app.get("/quote") async def get_quote() -> dict[str, float]: """Correct: an async client, awaited — the loop runs others while we wait.""" async with httpx.AsyncClient(timeout=5.0) as client: resp = await client.get("https://example.com/price") # yields here return {"price": resp.json()["price"]} ``` The handler above looks ordinary, but the `await` on the HTTP call is what keeps the process responsive: during that network round trip, this coroutine is parked and the loop is busy serving other requests. Swap `httpx.AsyncClient` for synchronous `requests.get()` and the line goes quiet — that one substitution is the difference between the service from the introduction and the service that replaced it. ## Pydantic validation: typed boundaries A web service sits at a trust boundary. Everything from the network is untrusted bytes; everything inside your handler should be trusted, typed Python. The job of the request boundary is to convert the former into the latter exactly once, at the edge, so the interior never wonders whether `user_id` is really an integer or `email` is really present. **Pydantic** is how FastAPI does this. You declare a model — a class describing fields, types, and constraints — and FastAPI uses it to *parse* the incoming JSON into a typed object. If the JSON doesn't fit, the request is rejected with a 422 and a precise per-field error report, and your handler never runs. This is the "parse, don't validate" principle, and the distinction is sharp. Validation checks a value and leaves you holding the same untyped data, hoping you remember it passed. Parsing transforms it into a type that *cannot* be malformed — once you hold a `UserCreate`, the type system guarantees `email` is a valid email and `password` is at least eight characters, because an object violating those couldn't have been constructed. The check happens once, at the door; the rest of the code is freed from defensive re-checking. ```python from pydantic import BaseModel, EmailStr, Field class UserCreate(BaseModel): """The request contract: what a valid 'create user' body must contain.""" email: EmailStr # rejected at the door if malformed username: str = Field(min_length=3, max_length=50) password: str = Field(min_length=8) # never appears in any response model class UserOut(BaseModel): """The response contract: exactly what we promise to send back.""" id: int email: EmailStr username: str model_config = {"from_attributes": True} # build directly from an ORM object ``` The two models above are deliberately different, and that difference is the most important habit in the chapter. `UserCreate` has a `password`; `UserOut` does not. When a handler declares `UserCreate` as its body and `UserOut` as its `response_model`, FastAPI parses the request into the first and serializes the response through the second — so the password hash, internal flags, and any field not named in `UserOut` are physically incapable of reaching the client, even if you return a full ORM object. Declaring `response_model` is not optional politeness; it is the wall that stops internal fields from leaking over the wire. The endpoint that ties them together is small precisely because the models carry the weight: ```python from fastapi import APIRouter, HTTPException, status router = APIRouter() @router.post("/users", response_model=UserOut, status_code=status.HTTP_201_CREATED) async def create_user(body: UserCreate) -> UserOut: """`body` is already validated; the return is filtered through UserOut.""" if await users.email_exists(body.email): raise HTTPException(status.HTTP_409_CONFLICT, "Email already registered") return await users.create(body) # an ORM object; UserOut strips it to the contract ``` There is no `if "email" not in data`, no manual type coercion, no hand-written 422 response. The forty fragile validators from the introduction became two model classes and a `response_model=` argument, and as a bonus FastAPI generated an OpenAPI schema and interactive Swagger docs from those same declarations — the types are simultaneously the validator, the serializer, and the documentation. > **Build it →** See typed boundaries at production scale: the FastAPI services in > [Project 05: SaaS Web Platform](https://github.com/jchu0/applied-cs-projects/tree/main/05-saas-web-platform) > and [Project 50: Feature Engineering Platform](https://github.com/jchu0/applied-cs-projects/tree/main/50-feature-engineering-platform) > use Pydantic request/response models and dependency-injected sessions throughout. ## Dependency injection with `Depends` The thing that most distinguishes FastAPI from Flask is how the handler gets its resources. A handler usually needs a database session, the authenticated user, and some configuration. The naive approach reaches for globals and pays for it in untestability and lifecycle bugs. FastAPI's answer is **dependency injection**: the handler declares what it needs as typed parameters wrapped in `Depends`, and the framework produces those values per request and passes them in. A dependency is just a callable. FastAPI inspects the handler's signature at startup, sees each `Depends(...)`, and builds a dependency graph — because dependencies can themselves depend on other dependencies. The classic chain is `get_current_user` depending on `get_db` (to look the user up) and on the bearer token (to identify them). FastAPI resolves the whole graph per request, caching any dependency used more than once within a request so it runs only once. The result is that a handler's signature reads as a precise statement of its needs, each one supplied by the framework rather than a global. The most important pattern is the **request-scoped resource** — above all, the database session. A dependency that `yield`s gives you setup-and-teardown around the request: the code before `yield` runs before the handler, the value is injected, and the code after `yield` runs once the response is sent, guaranteeing cleanup even if the handler raised. ```python from typing import Annotated from fastapi import Depends from sqlalchemy.ext.asyncio import AsyncSession async def get_db() -> AsyncGenerator[AsyncSession, None]: """One session per request; closed automatically when the request ends.""" async with async_session_factory() as session: yield session # injected into the handler # after the response is sent, the context manager closes the session DbSession = Annotated[AsyncSession, Depends(get_db)] # a reusable, named dependency async def get_current_user(token: Annotated[str, Depends(oauth2_scheme)], db: DbSession) -> User: """Depends on both the token and the DB — FastAPI wires the graph.""" user = await users.from_token(db, token) if user is None: raise HTTPException(status.HTTP_401_UNAUTHORIZED, "Invalid credentials") return user ``` The payoff is twofold. In production, every request gets a fresh session closed exactly once, sidestepping a whole genus of pooling and stale-connection bugs — provided you obey one rule: a service or handler must **never store the session as an attribute**. Store one on a long-lived object and the next request reuses a closed session, producing the infamous `Event loop is closed` errors. The second payoff is testing: because the handler asks for `get_db` rather than a global, a test can override that one dependency with a transactional test session and exercise the real handler against a real-but-disposable database — clean architecture in the small, with the framework enforcing the handler-to-domain seam. (That seam is the subject of the **Python: Design Patterns** chapter.) ## Auth and security for one service Authentication for a single service has three moving parts, and FastAPI's dependency system makes each a clean unit. The first is **password hashing**. You never store passwords; you store a slow, salted hash and compare hashes on login. Use a purpose-built password hash — bcrypt via `passlib` is the standard — so that even a breach doesn't hand over plaintext, and so the hash is deliberately slow enough to make brute force impractical. Hashing on signup and verifying on login are two small functions, and they are the entire defense between a leaked table and a credential-stuffing disaster. The second is **tokens**. After verifying a password at a login endpoint, the service issues a token — a JWT is common — that the client sends on later requests in the `Authorization: Bearer` header. The token is signed with a server-side secret, so the service can verify it without a database lookup, and it carries an expiry. The pattern that earns its keep is **short-lived access tokens** (15–30 minutes) paired with **longer-lived refresh tokens**: if an access token leaks, its window is small, and the client uses the refresh token to mint a new one without re-entering a password. Token verification then becomes a dependency — `get_current_user` from above — so every protected endpoint gets auth simply by declaring it needs the current user. ```python from passlib.context import CryptContext pwd = CryptContext(schemes=["bcrypt"], deprecated="auto") def hash_password(plaintext: str) -> str: return pwd.hash(plaintext) # slow, salted — store this, never the password def verify_password(plaintext: str, hashed: str) -> bool: return pwd.verify(plaintext, hashed) # constant-time comparison against the stored hash ``` The third is **CORS**. Browsers enforce the same-origin policy: a single-page app served from `app.example.com` cannot, by default, call an API at `api.example.com`. CORS is the server's way of declaring which other origins may call it, configured as middleware with an explicit allow-list. The mistake to avoid is the wildcard `allow_origins=["*"]` combined with credentials — a footgun that browsers reject anyway; name the origins you actually trust. Beyond these three, the validation from the Pydantic section is itself a security control: parsing every request at the boundary is how you stop malformed and malicious input from reaching your logic. Everything here is *single-service* auth; cross-service identity, token propagation, and gateway-level auth belong to the **Python: Microservices** chapter. ::: {.callout-warning} ## War story: the synchronous call that serialized a service A team shipped a FastAPI service that enriched every user-profile request with data from an internal scoring API. It passed every test and ran beautifully in staging, where one engineer clicked around at a time. In production it fell over within the hour: p99 latency climbed into the tens of seconds while CPU sat at 12%. The handler was `async def`, so it *looked* async — but inside it, someone had reached for the familiar synchronous `requests.get()`. Every time that line ran, it blocked the event loop for the full round trip, freezing not just that request but every other request the worker was juggling; concurrency made it worse, not better. The fix was one line — `requests.get()` became `await client.get()` with `httpx.AsyncClient` — and p99 dropped back under 50ms. The lesson has two halves that always travel together: an `async def` handler is a *promise* that you will not block the loop, and a single synchronous I/O call inside one breaks that promise for the whole process. When you can't avoid a blocking call, make the route `def` (FastAPI runs it in a threadpool) rather than lying with `async def`. The same discipline applies to skipping `response_model`: returning a raw ORM object "to save a line" is how an internal `password_hash` ends up in a JSON response and then in a client's logs — the typed response boundary exists precisely to make that leak impossible. ::: ## Serving in production A development server is `uvicorn app.main:app --reload` — one process, hot reload, fine for one developer. Production differs in three ways: multiple processes, no reload, and clean lifecycle handling. The standard setup runs **gunicorn as a process manager with uvicorn workers**: gunicorn supervises a pool of workers, restarts any that die, and handles signals; each worker is a uvicorn instance running its own event loop. This gives you the two orthogonal axes of scaling, and keeping them straight is the whole game. **More CPU cores → more gunicorn workers**, because a single Python process is bound by the GIL to one core's CPU. **More concurrent I/O → more coroutines per worker**, which the event loop gives you for free. A common starting point is one worker per core (sometimes `2 × cores + 1`), each handling thousands of concurrent connections. ```dockerfile # Production: gunicorn supervises N uvicorn workers (one event loop each). CMD ["gunicorn", "app.main:app", \ "-k", "uvicorn.workers.UvicornWorker", \ "--bind", "0.0.0.0:8000", "--workers", "4"] ``` The other half of production-readiness is **graceful startup and shutdown**, and FastAPI gives you one place for both: the `lifespan` context manager. The code before `yield` runs once at startup — open the connection pool, connect to the cache, verify dependencies — and the code after `yield` runs once at shutdown. On startup, open expensive resources (a connection pool) exactly once and reuse them across requests; opening a pool per request is a classic pool-exhaustion outage. Mark the app *ready* only after initialization completes, which is why production services split a **liveness** probe ("is the process alive?") from a **readiness** probe ("is it ready to serve?") — the load balancer routes traffic only when readiness passes, so requests don't hit a half-initialized app. ```python from contextlib import asynccontextmanager from fastapi import FastAPI @asynccontextmanager async def lifespan(app: FastAPI): app.state.pool = await create_pool(settings.dsn) # startup: open the pool once yield # app serves requests here await app.state.pool.close() # shutdown: drain connections cleanly app = FastAPI(lifespan=lifespan) ``` On shutdown, the ordering is reversed and the goal is to drain in flight. When the orchestrator sends `SIGTERM`, you want a brief grace period during which the load balancer stops sending new traffic, then you let in-flight requests finish, then you close resources in reverse order of how you opened them. Skip the grace period and you reset connections mid-request — `ConnectionResetError` for whoever was unlucky. A correct lifespan, a sensible worker count, and split health probes are most of what separates a service that survives a deploy from one that drops requests on every rollout. > **Build it →** Full production FastAPI services with lifespan-managed resources and > worker configuration: [Project 24: Synthetic Data Generator](https://github.com/jchu0/applied-cs-projects/tree/main/24-synthetic-data-generator) > serves an async API, and [Project 05: SaaS Web Platform](https://github.com/jchu0/applied-cs-projects/tree/main/05-saas-web-platform) > wires auth, pooled sessions, and graceful startup/shutdown end to end. This is the edge of single-service territory. When one service grows into many — a gateway in front of several services, retries and circuit breakers between them, distributed tracing across a call graph — you have crossed into the **Python: Microservices** chapter. The craft here, doing one service well, is the prerequisite for that one: a microservice is, first, a web service that gets the lifecycle right. --- ## Practical exercise **Difficulty:** Level I · Level II · Level III 1. **Level I — A typed endpoint.** Build a small FastAPI service with one resource (say, `notes`). Define a Pydantic request model with a constrained field (a non-empty title, a bounded-length body) and a *separate* response model that omits an internal field (`owner_id`). Implement `POST` and `GET /{id}` over an in-memory dict. Send a malformed request and confirm a 422 with a per-field error you never wrote, and confirm the omitted field never appears in a response. In a sentence, explain the difference between the handler's return type hint and its `response_model`. 2. **Level II — Inject auth and a session.** Add a login endpoint that hashes passwords with bcrypt and issues a signed token, and a `get_current_user` dependency that validates it. Replace the in-memory store with a real database behind a `get_db` dependency that yields a request-scoped session, and protect the write endpoints. Then trace one request end to end: what uvicorn does, what middleware runs, how the router picks the handler, the order in which `get_current_user` and `get_db` resolve, and where the response model filters the output. 3. **Level III — Make it production-correct.** Audit every handler for blocking calls inside `async def` and fix them (async client, async driver, or demote to `def`). Prove it with a load test: fire a few hundred concurrent requests at an I/O-bound endpoint and show throughput scales, then reintroduce a `time.sleep()` and watch it collapse with the CPU idle. Configure gunicorn with uvicorn workers sized to your cores, add a `lifespan` that opens a pooled resource once and drains it on shutdown, and split liveness from readiness. Finally, write a short paragraph on where this single service stops being enough — what specific need (a second service, inter-service auth, a gateway) would send you to the *Python: Microservices* chapter, and why those patterns don't belong here. ## Summary A modern Python web service is a single process running an asyncio event loop on top of ASGI, and a good framework's job is to own the request lifecycle so you can own the domain. uvicorn turns HTTP bytes into typed Python calls; middleware handles cross-cutting concerns; the router selects a handler; dependency injection supplies its resources; the async handler runs on the loop and calls your domain; and the response is serialized back out. Because the handler runs on the loop, one process serves thousands of concurrent I/O-bound requests — provided you never block it with a synchronous call. Pydantic turns the boundaries into typed contracts, parsing untrusted input once at the door and guaranteeing the output matches what you promised. `Depends` injects sessions, auth, and config without globals, keeping handlers testable. And production serving is gunicorn-supervised uvicorn workers plus a lifespan that opens resources once and drains them cleanly — the foundation the Microservices chapter builds on when a service multiplies. ### Key takeaways - The framework's job is the request lifecycle — server, middleware, routing, DI, handler, response — so your job is just the handler's domain logic. - ASGI + async lets one process serve thousands of concurrent I/O-bound requests; the one rule that breaks it is blocking the loop with a synchronous call inside `async def`. - Make a route `async def` only if it `await`s; if it must do blocking work with no async option, make it `def` and let FastAPI run it in a threadpool. - Pydantic models are typed boundaries: parse-don't-validate at the request edge, and a declared `response_model` is the wall that stops internal fields from leaking. - `Depends` injects request-scoped resources (DB sessions, current user) without globals; never store a session as an attribute, and override dependencies to test handlers. - Production = gunicorn + uvicorn workers (cores → workers, I/O → coroutines) plus a `lifespan` that opens resources once and drains them on `SIGTERM`. ### Connections to other chapters - **Concurrency and Parallelism Models** (prerequisite): the asyncio event loop is the engine this chapter runs on. The "don't block the loop" rule, cooperative multitasking, and the difference between `await asyncio.sleep()` and `time.sleep()` are that chapter's material — set in its comparative treatment of how six languages model concurrency — applied directly to handling HTTP requests at scale. - **Python: Microservices** (sibling): this chapter is one service done well; that one is what happens when one becomes many — service decomposition, inter-service communication, gateways, retries, circuit breakers. A microservice is first a web service that gets the lifecycle right, so this chapter is its foundation, not its rival. - **Python: Design Patterns** (extension): dependency injection and the handler-to-domain seam are clean-architecture patterns in the small. Keeping handlers thin, pushing logic into an injected service layer, and depending on abstractions rather than globals is what that chapter formalizes. - **Containerization with Docker** (Part V, extension): how this service ships. A FastAPI app is the canonical interpreter-plus-wheels image — a slim base, the dependency tree, a gunicorn entrypoint — and the multi-stage build and non-root patterns there turn the service you built here into the immutable artifact a platform runs. ## Further reading ### Essential - *FastAPI documentation* — the official tutorial and user guide; the canonical reference for `Depends`, response models, lifespan, and OpenAPI generation. - *Pydantic documentation* (v2) — models, validators, and the `model_validate`/`model_dump` API that drives every typed boundary in this chapter. ### Deep dives - *The ASGI specification* — the small, precise contract (`scope`, `receive`, `send`) that every async Python web server and framework implements; reading it demystifies what uvicorn actually hands your app. - *Starlette documentation* — the ASGI toolkit underneath FastAPI; middleware, routing, and the test client all originate here, and understanding it explains why FastAPI behaves as it does. ### Historical context - *PEP 3333 — Python Web Server Gateway Interface (WSGI)* and the *WSGI→ASGI evolution* — the synchronous standard that defined Python web development for a decade, and the reasoning behind the async successor that made the event-loop model in this chapter possible.