April 19, 2026

AI LLMs Tokenization Embeddings NLP

The Wall Between You and the Model: Tokens, Encoders, and Embeddings

GPT-4 can't count letters. Here's why — and how embeddings give token IDs the meaning the tokenizer strips away.

Johannes Hayer

Building ai-in-a-shell

Related articles

April 21, 2026
Why Your LLM Can't Think Ahead: The Token-by-Token Reality of Text Generation
Most developers treat LLMs as black boxes that write answers. The reality is stranger and more mechanical — and understanding it changes how you build.
June 14, 2026
The thing your AI agent is missing isn't a better model
I kept upgrading the model when my agent failed. Turns out I was solving the wrong problem. Here's what a harness is and why architecture beats model quality.
April 24, 2026
The Complete Journey of a Prompt: How LLMs Actually Process Your Input End-to-End
Most explanations cover one piece at a time. Here's the full data flow — from your prompt to the next generated token — traced through every component in order.

Learn it properly

Practice the AI Native Engineer Roadmap

Turn the article into concept cards, Socratic questions, and an AI tutor session that checks whether the model actually holds in your head.

Start a Synapse session Download iOS app