Published onMay 2, 2026Could Kimi K2.6 Hold Its Own Against Claude Opus on Real Pentest Work?ai-pentestingkluebenchmarkkimi-k2claude-opusptaasautonomous-securityappsecpentestingWe ran four frontier models through the same autonomous pentest engagement. Recall, time to finish, and dollars per run all tell different stories, and Kimi K2.6 turned out to be the surprise on the leaderboard.