KLUE · Autonomous Reasoning Agent

Meet KLUE.AI Pentest Platform that hacks you before they do.

KLUE finds and helps you patch vulnerabilities in hours, not days. One AI platform that runs full penetration tests while also covering your cloud, code analysis, and live app testing, so your whole business stays secure.

Public Find · 6.5-Hour Run

5CVE-level flaws found

Responsibly reported in popular open-source software.

Critical · 48-Hour Project

57+issues confirmed

Each one proven with a real, working exploit.

Benchmark · Code Review

100%accuracy

Caught 76.5% of all issues, each checked in seconds.

Full-accessCase 01

11findings

Server takeover caught before launch

Source-code review before go-live. Eleven critical bugs caught and chained on the staging environment.

Pre-launch SaaS
Partial-accessCase 02

37min

Account takeover

Coordinated disclosure with NCERT. Seven chained findings across login and session weaknesses.

Ministry of IT · NCERT
No-accessCase 03

61min

Full database read

Attacker-style test from the outside. From the first request to the most sensitive data, end to end.

Public sectorRead case

We ran the benchmark.
KLUE sat above the curve.

Rule-based tools face a built-in trade-off: catch more bugs and you raise more false alarms, stay quiet and you miss more. That trade-off keeps them pinned to a single curve. A reasoning engine is not bound by it.

RECALL %PRECISION %050100050100ABOVE THE CURVERULE-BASED CEILINGScanner A · 9.5%Scanner B · 61.1%Scanner C · 72.7%KLUE100% · 76.5%

Precision

100%

Zero false alarms across the public source-code review benchmark.

Recall

76.5%

Above the ceiling rule-based tools reach in practice.

Ground truth

48vulns

Hand-labeled before scoring. Zero ambiguity.

Validation time

7-12sec

Per finding, including a working exploit proof.

Most security tools look for what they already know.

A rule. A signature. A playbook of attacks someone wrote down years ago. The work of finding new ways in has been quietly outsourced to a library of old ways in.

The Unspoken Contract

"We will find what is already known. The rest is your problem."

Every product in the category, restated honestly. Useful as a baseline. Inadequate as a security program.

Every scanner · Every signature · Every playbook

Every code scanner, every vulnerability scanner, every breach and attack simulator on the market today shares one underlying engine. They look for patterns they were told to look for. They catch what their rules describe. Everything else slips past.

That worked when threats were a slow-moving library. It does not work when the bug you need to catch was written by your team yesterday, with no rule yet to describe it. And it does not work when the weakness itself is new. There is no rule for an AI feature that quietly opens a path into your database. There is no playbook entry for two safe steps run in the wrong order. Those are judgments about intent. Intent is not something a pattern can encode.

Three engagements. Three case studies.

Real targets. Real attack paths. Every step confirmed with a working proof. Nothing reported on suspicion alone.

Public Sector Web Application · No Access

Eleven findings. Sixty-one minutes. One database, fully compromised.

A public-facing citizen services portal. We shared no list of its pages or systems in advance. The agent mapped out the site on its own, the way a human tester would, and walked out with the database.

11

Findings

61min

Duration

Crown jewel

Full database read

A database injection flaw (blind SQLi) chained up to full admin rights.

The chain

  1. 01Map the site.
  2. 02Probe for weak spots.
  3. 03Prove the exploit.
  4. 04Report.

Outcome

Critical findings fixed within the same week. The cross-site sharing rule and the extra-fields flaw (mass assignment) were patched first. The database account was stripped of admin rights the same day the report shipped.

Ministry of IT Portal · Coordinated with NCERT

Seven findings, one chain, full account takeover.

A national portal run under the Ministry of Information Technology. The agent found a chain that reached every account on the system without ever entering a password. Disclosure was coordinated with the National Computer Emergency Response Team (NCERT).

7

Findings

37min

Duration

Crown jewel

Account takeover

Any official, by username. No password required.

The chain

  1. 01View any account without logging in.
  2. 02List every account.
  3. 03Password reset handed out a real login.
  4. 04Become an administrator.

Outcome

Coordinated disclosure with NCERT. A login check was added to the four critical pages within forty-eight hours.CVSS 9.8

Case Study II · Engagement record

Cybersecurity Training Platform · Pre-launch Full-access Review

Eleven findings before go-live. Server takeover caught on staging.

A hands-on training platform two weeks out from public launch. The agent was pointed at the code on the dev branch and surfaced issues the usual automated checks had not flagged, including a path to take over the whole server through one of its containers.

11

Findings

Full-access

Duration

Crown jewel

Server takeover

A container was given keys to its host and run with full privileges.

The chain

  1. 01A container had keys to its own host.
  2. 02Any website could act as a logged-in user.
  3. 03A secret key was hard-coded as a fallback.
  4. 04Commands were run through the system shell.

Outcome

All four high-severity issues fixed on the dev branch before launch. The container's access to its host was locked down to a narrow, controlled path. The container shipped with reduced privileges. Launch went out on schedule with no findings outstanding.

Case Study III · Engagement record

Pricing

Plans from $150/month. Services quoted to fit.

Self-serve KLUE subscriptions for continuous testing, or a scoped quote for pentest, red team, and managed security. See the full breakdown.

Review pricing

One agent. Four jobs.

Source-code review, live app testing, cloud audit, and a full attacker-style pentest, same reasoning engine, same isolated runtime, same report format. Pick what to test; the agent does the rest.

Full-access & attacker-style tests

The full range of testing in one engine. Attacker-style from the outside, source-code review, or a partial-access hybrid - same reasoning runtime.

  • Attacker-style test from outside
  • Full-access source + infrastructure
  • Partial-access hybrid runs
  • Exploits proven in a real browser

Source-code review & live app testing

Source-code review that follows how data moves, paired with live testing against running web apps and APIs. The logic bugs scanners miss.

  • Traces how data moves through code
  • Live web and API testing
  • Business logic and login bypass
  • Working proof you can verify

Cloud & Microsoft 365

Checks how your cloud and Microsoft 365 are set up. Spots sprawling access, things left public, and gaps in your sign-in rules.

  • AWS / Azure / Google Cloud setup
  • Microsoft 365 tenant assessment
  • Sprawling user access and roles
  • CIS / NIST framework mapping

Code review

Connect a repository. The agent follows where untrusted input can reach, spots custom logic bugs, and checks your infrastructure configuration before deploy.

  • Custom logic flaws
  • Tracing where bad input lands
  • Terraform / infrastructure config review
  • Supply chain and dependency risks

Thirty minutes. Not four hours. Not weeks.

How much you catch is half the story. How long it takes is the other half. Most AI testing tools take four to six hours a run, which means you can only afford to test once a quarter. A thirty-minute run can fit into every release.

ToolRecallDurationSoft Positive Rate

KLUE

Reasoning agent, thirty-minute budget

82%~30 min~4%

Leading commercial agent

Combines several AI models

75.0%4 hr6.3%

Raw frontier model

A top model on its own

70.0%10 min6.7%

Open-source agent A

Same underlying AI model

45.0%4 hr10.0%

Open-source agent B

Same underlying AI model

30.0%6 hr25.0%

Open-source agent C

Same underlying AI model

5.0%2 hr0.0%

Public benchmark against a well-known target with twenty-two documented vulnerabilities. KLUE figures come from real runs under a thirty-minute budget. Other figures come from publicly reported results.

One

It is how the agent is built, not the AI behind it.

Three tools on the public board use the same underlying AI model and catch 45%, 30%, and 5% of bugs. That gap comes down to how each one is built. KLUE is the difference.

Two

Model choice becomes a cost decision.

On a separate benchmark, KLUE on a cheaper open model matched a top-tier paid model on the most critical bugs, at one-fifth the cost.

Three

Speed is what unlocks it.

Four-hour tests force you to a quarterly schedule. Thirty-minute tests run after every release. That is the bar serious autonomous tools will be measured against.

Three categories. The category decides the ceiling.

Every product in this space falls into one of three groups, set by what its engine actually does: matches known patterns, replays known attacks, or reasons about the target. The group sets the cap.

CapabilityKLUEVulnerability scannersBreach simulatorsOther AI agentsHuman pentesters
Engine
Underlying modelReasoningRulesSimulated playbooksMixed agentsHuman reasoning
Discovers unknown vulnerabilitiesPartial
Writes custom exploit codePer targetPresetPreset
Chains findings into attack pathsMulti-stepLimitedSomeManual
Delivery
Time per engagement~30 min to 6 hrAlways-on scanHoursHours2 to 6 weeks
OutputPDF, data export, working proofCSV, dashboardDashboardDashboard, PDFPDF (weeks later)
Free retest after fixNot applicableRe-runRe-runExtra cost
Operations
Runs on every releaseKnown-flaw scan onlyScheduledScheduledImpossible
Parallel scansUnlimitedUnlimited, shallowScheduledScheduledOne per team
Vendor modelProprietary, exclusiveLicensed softwareLicensed softwareLicensed softwareConsulting hours

Most tools answer "what known weakness might you have?" KLUE answers "what would an attacker actually do here?" The category decides which question can be asked.

Closing

See what an hour finds.

Run KLUE against your own systems. One hour. Real exploits. Real fixes. No procurement cycle. No consulting engagement.