What Actually Happens Inside a Transformer Block

Attention gets all the press. But a transformer block is more than attention — there's a feedforward network that holds most of the parameters, two residual connections, and a normalisation design that determines whether large-scale training is stable. Here's all of it, in order.

Johannes Hayer avatar

Johannes Hayer

Building ai-in-a-shell

Related articles

Learn it properly

Practice the AI Native Engineer Roadmap

Turn the article into concept cards, Socratic questions, and an AI tutor session that checks whether the model actually holds in your head.