How LLMs Work
Build an intuition-first understanding of Large Language Models. Follow Nina, a product engineer, from the physical "two-file" reality through training, scaling, and the limits of what models can do on their own.
Why take this course?
A radically transparent demystification of AI. Strip away the hype and explore the anatomy, evolution, and limits of modern LLMs through direct analogies, visual models, and a hands-on engineering narrative.
Course Modules
Before the two-file reality, there was a century of failed attempts. Follow the arc from counting words (Bag-of-Words) to meaning vectors (word2vec) to contextual attention to the Transformer breakthrough — and understand why decoder-only models like GPT became the LLMs we know today.
Learning Goals
- Explain why computers need numbers to process language, and how each approach solved the failures of the last.
- Describe the Bag-of-Words model, its strengths, and its critical limitations.
- Understand how embeddings and attention solved the context problem (explored in depth in later courses).
- Understand how the Transformer solved the sequential bottleneck of RNNs.
- Distinguish encoder-only (BERT) from decoder-only (GPT) Transformers and when each is appropriate.
Concept Card Preview
Visuals, diagrams, and micro-interactions you'll see in this module.

The Fundamental Problem: Computers Need Numbers
Nina notices something odd. She asks her model to summarize two articles — one about a "river bank" and one about a "ban…
Bag-of-Words: The Naive First Attempt
The earliest approach was brutally simple: count the words.
In the Bag-of-Words model, a document becomes a vec…

From Counting to Meaning: Embeddings & Attention
In 2013, Word2Vec had a breakthrough: position words in a vector space based on how they're used. Words in similar c…
Follow Nina as she discovers that an LLM is just two files: a 140GB parameters file and 500 lines of C code. Understand inference, the 100x compression ratio, and why the model is a black box you cannot debug by reading code.
Learning Goals
- Explain the physical model of an LLM: a parameters file and a run file.
- Describe inference as a loop of next-token prediction.
- Understand hallucination as a consequence of lossy compression, not a software bug.
Concept Card Preview
Visuals, diagrams, and micro-interactions you'll see in this module.

Nina's Black Box
Nina is a product engineer. She ships features, debugs production incidents, and reviews pull requests. Last quarter, he…
Just Two Files
Here's the first surprise: an open-source LLM like Llama 2 70B is literally two files on a computer.
The first file…
Understanding the Parameters
Where do those 140 billion numbers come from? Nobody writes them by hand. They're discovered through a massive compu…
Trace the evolution from an internet "dreamer" to a polished assistant. Understand the three stages — Pre-training (lossy compression), Fine-tuning (behavioral formatting), and RLHF (human preference polish) — and why alignment improves tone but not accuracy.
Learning Goals
- Explain Pre-training as lossy compression of the internet into a base model.
- Describe how Fine-tuning and RLHF transform a dreamer into a helpful assistant.
- Recognize the alignment tax: fluent delivery does not equal factual accuracy.
Concept Card Preview
Visuals, diagrams, and micro-interactions you'll see in this module.

Nina's Misbehaving Model
Nina downloads an open-source base model to test locally. She types: "What is the best way to handle database migrations…
Stage 1: Pre-training (The Dreamer)
Pre-training is the first and most expensive stage. A cluster of thousands of GPUs processes roughly 10 terabytes of int…
Hallucination Is Compression
Why do LLMs hallucinate? Because they're running lossy decompression.
When you ask the model a question, it doesn't…
Discover why scaling laws drive the compute arms race, and why bigger models still can't do math. Understand what scaling fixes (general capability) and what it doesn't (arithmetic, real-time data, hallucination) — setting up the architectural patterns you'll learn in later courses.
Learning Goals
- Understand scaling laws as a predictable investment curve for general capability.
- Distinguish between general capability gaps (scaling helps) and architectural gaps (scaling alone cannot fix).
- Recognize that arithmetic, real-time data, and hallucination require architectural solutions beyond scaling.
Concept Card Preview
Visuals, diagrams, and micro-interactions you'll see in this module.

Nina's Calculator Problem
Nina's support assistant handles text queries brilliantly. But when a customer asks: "What's my prorated refund if I can…
Scaling Laws: Predictable Investment
Why are companies spending billions on bigger models? Because of a remarkable discovery: scaling laws.
In 2020, Ope…
What Scaling Actually Buys You
Scaling isn't useless — it's dramatically powerful for the right problems. Each generation of larger models demonstrably…
Consolidate the four mental models into a practical builder's toolkit. Learn three rules for production LLMs, and chart your path to the next courses in the roadmap.
Learning Goals
- Synthesize the four core mental models (the arc, two-file reality, training pipeline, scaling & limits) into a unified toolkit.
- Apply three production rules: never trust the model's memory, math, or confidence.
- Identify which advanced topics (tokens & embeddings, prompt engineering, RAG, agents) address which gaps.
Concept Card Preview
Visuals, diagrams, and micro-interactions you'll see in this module.
Nina's Toolkit
Four modules. Four mental models. Nina started with a question from her VP — "How does this thing actually work?" — and…
Three Rules for Building with LLMs
Nina distilled everything she learned into three rules she applies to every LLM feature she builds.
**Rule 1: Never tru…
What's Next — and Why Each Course Exists
You have the mental models. Now each gap in your toolkit maps to a course that teaches you to fill it.
Remember Nina's…