/ ai-kernels
06 AI × Kernels
The newest frontier: can models write the kernels? KernelBench, test-time search, RL, and the honest picture of where AI-generated kernels win and where they still fail.
→
KernelBench & measuring AI kernels
The benchmark, the fast_p metric, and why 'correct and faster than PyTorch' is a surprisingly hard bar.
→
Monkeys & search: test-time scaling
Sampling 100 kernels, iterative feedback, and evolutionary/tree search — how DeepSeek-V3 went 4% → 72%.
→
The CRFM experiments
Natural-language optimization ideas + parallel branching: LayerNorm at 484% of PyTorch — and FlashAttention at 9%.
→
Kevin, RL & what's still human
Multi-turn RL with verifiable rewards, the KernelBook dataset, and why the human+AI+profiler loop is the real answer.
