Vizuara Kernel Engineering
Vizuara AI Labs

Vizuara Kernel Engineering

From silicon to speculative decoding — write GPU kernels that actually run modern LLMs. A worklog from the silicon up to FlashAttention, NVFP4, DeepSeek's DSpark, and AI-generated kernels — every step measured, profiled, and drawn by hand.

01
The Book
The full knowledge base — a 72-chapter illustrated worklog from the silicon up to FlashAttention, NVFP4 and DeepSeek's DSpark. Free to read, forever.
72 chapters · 236 figures
02
The Workshop
Vizuara's live Kernel Engineering cohort: 8 foundational lectures + 6 deep-dive workshops on modern kernel-inference topics, with the full book included.
8 lectures · 6 workshops
03
Projects
Build real kernels with your hands — the GPU-Puzzles track, a GEMM you take to 94% of cuBLAS, FlashAttention from scratch, and the You-vs-the-machine capstone.
guided builds
04
Interactive
Practice, not just read: per-section quizzes, the guided GPU-Puzzles track, and a growing set of hands-on kernel challenges.
quizzes · puzzles

Built around what you're actually hired to do

Matmul from scratch to 94% of cuBLAS · the same ladder on tensor cores · reading SASS & Nsight Compute · TMA/WGMMA on Hopper · NVFP4 & TMEM on Blackwell · Triton and real CUTLASS · FlashAttention · the vLLM debugging workflow · and LLM-driven kernel search — knowing where it wins and where it still fails.

The kernel engineer's skill map →