Vizuara AI Labs

Vizuara Kernel Engineering

From silicon to speculative decoding — write GPU kernels that actually run modern LLMs. A worklog from the silicon up to FlashAttention, NVFP4, DeepSeek's DSpark, and AI-generated kernels — every step measured, profiled, and drawn by hand.

Read the book → The workshop

The Book

The full knowledge base — a 72-chapter illustrated worklog from the silicon up to FlashAttention, NVFP4 and DeepSeek's DSpark. Free to read, forever.

72 chapters · 236 figures →

The Workshop

Vizuara's live Kernel Engineering cohort: 8 foundational lectures + 6 deep-dive workshops on modern kernel-inference topics, with the full book included.

8 lectures · 6 workshops →

Projects

Build real kernels with your hands — the GPU-Puzzles track, a GEMM you take to 94% of cuBLAS, FlashAttention from scratch, and the You-vs-the-machine capstone.

guided builds →

Interactive

Practice, not just read: per-section quizzes, the guided GPU-Puzzles track, and a growing set of hands-on kernel challenges.

quizzes · puzzles →

Built around what you're actually hired to do

Matmul from scratch to 94% of cuBLAS · the same ladder on tensor cores · reading SASS & Nsight Compute · TMA/WGMMA on Hopper · NVFP4 & TMEM on Blackwell · Triton and real CUTLASS · FlashAttention · the vLLM debugging workflow · and LLM-driven kernel search — knowing where it wins and where it still fails.

The kernel engineer's skill map →