/ ai-kernels

06 AI × Kernels

The newest frontier: can models write the kernels? KernelBench, test-time search, RL, and the honest picture of where AI-generated kernels win and where they still fail.

→

KernelBench & measuring AI kernels

The benchmark, the fast_p metric, and why 'correct and faster than PyTorch' is a surprisingly hard bar.

→

Monkeys & search: test-time scaling

Sampling 100 kernels, iterative feedback, and evolutionary/tree search — how DeepSeek-V3 went 4% → 72%.

→

The CRFM experiments

Natural-language optimization ideas + parallel branching: LayerNorm at 484% of PyTorch — and FlashAttention at 9%.

→

Kevin, RL & what's still human

Multi-turn RL with verifiable rewards, the KernelBook dataset, and why the human+AI+profiler loop is the real answer.