Building Your First LLM Application

The gap between calling an API and shipping an app — a companion to Chapter 4 of my book, AI Engineering.

James Hu3 min readUpdated May 31, 2026
llm
ai-engineering
rag
getting-started
book
tutorial
Cover of the book AI Engineering: Building Production-Ready LLM Applications by James Hu

Where I Started

On the distance between a single API call and an application you can actually run.

Most people's first contact with a large language model is a one-liner: send a string, print the response. It feels like magic, and it is — for about five minutes. Then you try to build something real on top of it, and the magic turns into a list of unglamorous questions. Where do the API keys live? How do you keep them out of your shell history and your git repo? What happens when the call times out, or the model returns something you didn't expect? Where does the conversation state go?

None of that is about prompting. It's about engineering. And it's the part nobody shows you in the demo.

What a First App Actually Involves

When I set out to write the introductory project for my book, I deliberately picked something small but complete: a document question-answering assistant. Small enough to finish in an afternoon; complete enough that you hit every real concern at least once.

The shape of it is the shape of almost every LLM app you'll ever build:

  • A clean environment. A virtual environment, dependencies pinned, secrets in a .env file that never gets committed. Boring, and the thing people skip and regret.
  • Configuration as a first-class citizen. One place that knows your model name, your API key, your defaults — not magic strings scattered across files.
  • A first real call. Sending a prompt, getting a completion, and — crucially — handling the case where it fails.
  • Context. The moment your app needs to answer questions about your documents rather than the whole internet, you've discovered retrieval. Even the simplest version — find the relevant text, paste it into the prompt — is the seed of every RAG system you'll build later.

That last step is the one that flips a toy into a tool. The model didn't get smarter; you gave it the right context at the right moment. That instinct — the answer is usually better context, not a better prompt — is most of the job.

Why Build End-to-End First

It's tempting to start with theory: transformers, embeddings, attention. I think that's backwards for an engineer. Build the whole loop once — keys, config, a call, some context, a response on screen — and the theory has somewhere to land. You learn what's load-bearing by watching what breaks.

The goal of a first app isn't a good app. It's a map. Once you've stood up the end-to-end thing, every later chapter — prompt engineering, real RAG, agents, deployment — is just deepening a piece you've already touched.


This is a companion to Chapter 4: Your First LLM Application in my book, AI Engineering: Building Production-Ready LLM Applications. The full chapter walks through the complete document Q&A build, step by step, with runnable code.

Read the full chapter free → book.jameshu.io

Related Articles