No Image

How LLMs Work

Build an intuition-first understanding of Large Language Models. Follow Nina, a product engineer, from the physical "two-file" reality through training, scaling, and the limits of what models can do on their own.

Why take this course?

A radically transparent demystification of AI. Strip away the hype and explore the anatomy, evolution, and limits of modern LLMs through direct analogies, visual models, and a hands-on engineering narrative.

Course Modules

1Module 1: Why LLMs Exist (The History)

Before the two-file reality, there was a century of failed attempts. Follow the arc from counting words (Bag-of-Words) to meaning vectors (word2vec) to contextual attention to the Transformer breakthrough — and understand why decoder-only models like GPT became the LLMs we know today.

Learning Goals

Explain why computers need numbers to process language, and how each approach solved the failures of the last.
Describe the Bag-of-Words model, its strengths, and its critical limitations.
Understand how embeddings and attention solved the context problem (explored in depth in later courses).
Understand how the Transformer solved the sequential bottleneck of RNNs.
Distinguish encoder-only (BERT) from decoder-only (GPT) Transformers and when each is appropriate.

Concept Card Preview

Visuals, diagrams, and micro-interactions you'll see in this module.

The Fundamental Problem: Computers Need Numbers

Nina notices something odd. She asks her model to summarize two articles — one about a "river bank" and one about a "ban…

Bag-of-Words: The Naive First Attempt

The earliest approach was brutally simple: count the words.

In the Bag-of-Words model, a document becomes a vec…

From Counting to Meaning: Embeddings & Attention

In 2013, Word2Vec had a breakthrough: position words in a vector space based on how they're used. Words in similar c…

2Module 2: The "Two-File" Reality (The Anatomy)

Follow Nina as she discovers that an LLM is just two files: a 140GB parameters file and 500 lines of C code. Understand inference, the 100x compression ratio, and why the model is a black box you cannot debug by reading code.

Learning Goals

Explain the physical model of an LLM: a parameters file and a run file.
Describe inference as a loop of next-token prediction.
Understand hallucination as a consequence of lossy compression, not a software bug.

Concept Card Preview

Visuals, diagrams, and micro-interactions you'll see in this module.

Nina's Black Box

Nina is a product engineer. She ships features, debugs production incidents, and reviews pull requests. Last quarter, he…

Just Two Files

Here's the first surprise: an open-source LLM like Llama 2 70B is literally two files on a computer.

The first file…

Understanding the Parameters

Where do those 140 billion numbers come from? Nobody writes them by hand. They're discovered through a massive compu…

3Module 3: The Training Pipeline (The Evolution)

Trace the evolution from an internet "dreamer" to a polished assistant. Understand the three stages — Pre-training (lossy compression), Fine-tuning (behavioral formatting), and RLHF (human preference polish) — and why alignment improves tone but not accuracy.

Learning Goals

Explain Pre-training as lossy compression of the internet into a base model.
Describe how Fine-tuning and RLHF transform a dreamer into a helpful assistant.
Recognize the alignment tax: fluent delivery does not equal factual accuracy.

Concept Card Preview

Visuals, diagrams, and micro-interactions you'll see in this module.

Nina's Misbehaving Model

Nina downloads an open-source base model to test locally. She types: "What is the best way to handle database migrations…

Loading diagram...

Stage 1: Pre-training (The Dreamer)

Pre-training is the first and most expensive stage. A cluster of thousands of GPUs processes roughly 10 terabytes of int…

Loading diagram...

Hallucination Is Compression

Why do LLMs hallucinate? Because they're running lossy decompression.

When you ask the model a question, it doesn't…

4Module 4: Scaling & Limits

Discover why scaling laws drive the compute arms race, and why bigger models still can't do math. Understand what scaling fixes (general capability) and what it doesn't (arithmetic, real-time data, hallucination) — setting up the architectural patterns you'll learn in later courses.

Learning Goals

Understand scaling laws as a predictable investment curve for general capability.
Distinguish between general capability gaps (scaling helps) and architectural gaps (scaling alone cannot fix).
Recognize that arithmetic, real-time data, and hallucination require architectural solutions beyond scaling.

Concept Card Preview

Visuals, diagrams, and micro-interactions you'll see in this module.

Nina's Calculator Problem

Nina's support assistant handles text queries brilliantly. But when a customer asks: "What's my prorated refund if I can…

Scaling Laws: Predictable Investment

Why are companies spending billions on bigger models? Because of a remarkable discovery: scaling laws.

In 2020, Ope…

Loading diagram...

What Scaling Actually Buys You

Scaling isn't useless — it's dramatically powerful for the right problems. Each generation of larger models demonstrably…

5Module 5: Your LLM Engineering Toolkit

Consolidate the four mental models into a practical builder's toolkit. Learn three rules for production LLMs, and chart your path to the next courses in the roadmap.

Learning Goals

Synthesize the four core mental models (the arc, two-file reality, training pipeline, scaling & limits) into a unified toolkit.
Apply three production rules: never trust the model's memory, math, or confidence.
Identify which advanced topics (tokens & embeddings, prompt engineering, RAG, agents) address which gaps.

Concept Card Preview

Visuals, diagrams, and micro-interactions you'll see in this module.

Nina's Toolkit

Four modules. Four mental models. Nina started with a question from her VP — "How does this thing actually work?" — and…

Loading diagram...

Three Rules for Building with LLMs

Nina distilled everything she learned into three rules she applies to every LLM feature she builds.

**Rule 1: Never tru…

Loading diagram...

What's Next — and Why Each Course Exists

You have the mental models. Now each gap in your toolkit maps to a course that teaches you to fill it.

Remember Nina's…