- Published on
We ran KLUE against the same live production target four times, once each behind GLM 5.2, DeepSeek V4 Pro, Kimi K2.7 and Opus 4.8, all on an identical thirty minute budget. Same target, same clock, four risk verdicts from Low to Critical. A field report on what each model sees, what it walks past, whether it got the severity right, and why the cheapest run came back with the most.