The Kernel Engineering Book
The complete, free knowledge base behind Vizuara's Kernel Engineering — 72 illustrated worklog chapters across seven parts, each written in the hypothesis → measure → figure rhythm and cross-linked to the chapters it needs. Start anywhere.
How to read this book →The mental models. Why kernels decide who wins, how to think in speed-of-light terms, and the exact skill map a GPU kernel engineer is hired for.
A tour of the H100/B200 from the die down to the wire. Every component, what it costs, and why it exists — the vocabulary the rest of the site assumes.
From the abstract launch to the metal. Threads to grids, the compilation story, and the primitives you build every kernel out of.
The heart of the course. We rebuild matrix multiply from a 1.3%-of-cuBLAS naive kernel to a 94% warptiled monster — one optimization, one measurement, one figure at a time. Then we do it again on tensor cores.
Where the GEMM skills meet real LLMs. Fusion, softmax, attention, FlashAttention, the KV cache and the quantized kernels that serve tokens at scale.
The cutting edge, as of now. Hopper's async engine, Blackwell's tensor memory and NVFP4, DeepSeek's open kernels, CUTLASS the hard way, and how to debug when it all breaks.
The newest frontier: can models write the kernels? KernelBench, test-time search, RL, and the honest picture of where AI-generated kernels win and where they still fail.
