Most security tools look for what they already know.

A rule. A signature. A playbook of attacks someone wrote down years ago. The work of finding new ways in has been quietly outsourced to a library of old ways in.

The Unspoken Contract

"We will find what is already known. The rest is your problem."

Every product in the category, restated honestly. Useful as a baseline. Inadequate as a security program.

Every scanner · Every signature · Every playbook

Every code scanner, every vulnerability scanner, every breach and attack simulator on the market today shares one underlying engine. They look for patterns they were told to look for. They catch what their rules describe. Everything else slips past.

That worked when threats were a slow-moving library. It does not work when the bug you need to catch was written by your team yesterday, with no rule yet to describe it. And it does not work when the weakness itself is new. There is no rule for an AI feature that quietly opens a path into your database. There is no playbook entry for two safe steps run in the wrong order. Those are judgments about intent. Intent is not something a pattern can encode.

Three engagements. Three case studies.

Real targets. Real attack paths. Every step confirmed with a working proof. Nothing reported on suspicion alone.

Public Sector Web Application

running

11 findings

Public Sector Web Application · No Access

Eleven findings. Sixty-one minutes. One database, fully compromised.

A public-facing citizen services portal. We shared no list of its pages or systems in advance. The agent mapped out the site on its own, the way a human tester would, and walked out with the database.

11

Findings

61min

Duration

Crown jewel

Full database read

A database injection flaw (blind SQLi) chained up to full admin rights.

The chain

01Map the site.
02Probe for weak spots.
03Prove the exploit.
04Report.

Outcome

Critical findings fixed within the same week. The cross-site sharing rule and the extra-fields flaw (mass assignment) were patched first. The database account was stripped of admin rights the same day the report shipped.

Read the full breakdown

Ministry of IT Portal

running

7 findings

Ministry of IT Portal · Coordinated with NCERT

Seven findings, one chain, full account takeover.

A national portal run under the Ministry of Information Technology. The agent found a chain that reached every account on the system without ever entering a password. Disclosure was coordinated with the National Computer Emergency Response Team (NCERT).

7

Findings

37min

Duration

Crown jewel

Account takeover

Any official, by username. No password required.

The chain

01View any account without logging in.
02List every account.
03Password reset handed out a real login.
04Become an administrator.

Outcome

Coordinated disclosure with NCERT. A login check was added to the four critical pages within forty-eight hours.CVSS 9.8

Case Study II · Engagement record

Cybersecurity Training Platform

running

11 findings

Cybersecurity Training Platform · Pre-launch Full-access Review

Eleven findings before go-live. Server takeover caught on staging.

A hands-on training platform two weeks out from public launch. The agent was pointed at the code on the dev branch and surfaced issues the usual automated checks had not flagged, including a path to take over the whole server through one of its containers.

11

Findings

Full-access

Duration

Crown jewel

Server takeover

A container was given keys to its host and run with full privileges.

The chain

01A container had keys to its own host.
02Any website could act as a logged-in user.
03A secret key was hard-coded as a fallback.
04Commands were run through the system shell.

Outcome

All four high-severity issues fixed on the dev branch before launch. The container's access to its host was locked down to a narrow, controlled path. The container shipped with reduced privileges. Launch went out on schedule with no findings outstanding.

Case Study III · Engagement record

One agent. Four jobs.

Source-code review, live app testing, cloud audit, and a full attacker-style pentest, same reasoning engine, same isolated runtime, same report format. Pick what to test; the agent does the rest.

Full-access & attacker-style tests

The full range of testing in one engine. Attacker-style from the outside, source-code review, or a partial-access hybrid - same reasoning runtime.

Attacker-style test from outside
Full-access source + infrastructure
Partial-access hybrid runs
Exploits proven in a real browser

Source-code review & live app testing

Source-code review that follows how data moves, paired with live testing against running web apps and APIs. The logic bugs scanners miss.

Traces how data moves through code
Live web and API testing
Business logic and login bypass
Working proof you can verify

Cloud & Microsoft 365

Checks how your cloud and Microsoft 365 are set up. Spots sprawling access, things left public, and gaps in your sign-in rules.

AWS / Azure / Google Cloud setup
Microsoft 365 tenant assessment
Sprawling user access and roles
CIS / NIST framework mapping

Code review

Connect a repository. The agent follows where untrusted input can reach, spots custom logic bugs, and checks your infrastructure configuration before deploy.

Custom logic flaws
Tracing where bad input lands
Terraform / infrastructure config review
Supply chain and dependency risks

Thirty minutes. Not four hours. Not weeks.

How much you catch is half the story. How long it takes is the other half. Most AI testing tools take four to six hours a run, which means you can only afford to test once a quarter. A thirty-minute run can fit into every release.

Tool	Recall	Duration	Soft Positive Rate
KLUE Reasoning agent, thirty-minute budget	82%	~30 min	~4%
Leading commercial agent Combines several AI models	75.0%	4 hr	6.3%
Raw frontier model A top model on its own	70.0%	10 min	6.7%
Open-source agent A Same underlying AI model	45.0%	4 hr	10.0%
Open-source agent B Same underlying AI model	30.0%	6 hr	25.0%
Open-source agent C Same underlying AI model	5.0%	2 hr	0.0%

Public benchmark against a well-known target with twenty-two documented vulnerabilities. KLUE figures come from real runs under a thirty-minute budget. Other figures come from publicly reported results.

One

It is how the agent is built, not the AI behind it.

Three tools on the public board use the same underlying AI model and catch 45%, 30%, and 5% of bugs. That gap comes down to how each one is built. KLUE is the difference.

Two

Model choice becomes a cost decision.

On a separate benchmark, KLUE on a cheaper open model matched a top-tier paid model on the most critical bugs, at one-fifth the cost.

Three

Speed is what unlocks it.

Four-hour tests force you to a quarterly schedule. Thirty-minute tests run after every release. That is the bar serious autonomous tools will be measured against.

Three categories. The category decides the ceiling.

Every product in this space falls into one of three groups, set by what its engine actually does: matches known patterns, replays known attacks, or reasons about the target. The group sets the cap.

Capability	KLUE	Vulnerability scanners	Breach simulators	Other AI agents	Human pentesters
Engine
Underlying model	Reasoning	Rules	Simulated playbooks	Mixed agents	Human reasoning
Discovers unknown vulnerabilities				Partial
Writes custom exploit code	Per target		Preset	Preset
Chains findings into attack paths	Multi-step		Limited	Some	Manual
Delivery
Time per engagement	~30 min to 6 hr	Always-on scan	Hours	Hours	2 to 6 weeks
Output	PDF, data export, working proof	CSV, dashboard	Dashboard	Dashboard, PDF	PDF (weeks later)
Free retest after fix		Not applicable	Re-run	Re-run	Extra cost
Operations
Runs on every release		Known-flaw scan only	Scheduled	Scheduled	Impossible
Parallel scans	Unlimited	Unlimited, shallow	Scheduled	Scheduled	One per team
Vendor model	Proprietary, exclusive	Licensed software	Licensed software	Licensed software	Consulting hours

Most tools answer "what known weakness might you have?" KLUE answers "what would an attacker actually do here?" The category decides which question can be asked.

Closing

See what an hour finds.

Run KLUE against your own systems. One hour. Real exploits. Real fixes. No procurement cycle. No consulting engagement.

Book an engagement Contact Sales

Meet KLUE.Your AI Security Engineer that does Continuous Pentesting and Code Analysis.

We ran the benchmark.
KLUE sat above the curve.

Most security tools look for what they already know.

Three engagements. Three case studies.

Eleven findings. Sixty-one minutes. One database, fully compromised.

Seven findings, one chain, full account takeover.

Eleven findings before go-live. Server takeover caught on staging.

Plans from $150/month. Services quoted to fit.

One agent. Four jobs.

Full-access & attacker-style tests

Source-code review & live app testing

Cloud & Microsoft 365

Code review

Thirty minutes. Not four hours. Not weeks.

It is how the agent is built, not the AI behind it.

Model choice becomes a cost decision.

Speed is what unlocks it.

Three categories. The category decides the ceiling.

See what an hour finds.

Meet KLUE.Your AI Security Engineer that does Continuous Pentesting and Code Analysis.

We ran the benchmark.KLUE sat above the curve.

Most security tools look for what they already know.

Three engagements. Three case studies.

Eleven findings. Sixty-one minutes. One database, fully compromised.

Seven findings, one chain, full account takeover.

Eleven findings before go-live. Server takeover caught on staging.

Plans from $150/month. Services quoted to fit.

One agent. Four jobs.

Full-access & attacker-style tests

Source-code review & live app testing

Cloud & Microsoft 365

Code review

Thirty minutes. Not four hours. Not weeks.

It is how the agent is built, not the AI behind it.

Model choice becomes a cost decision.

Speed is what unlocks it.

Three categories. The category decides the ceiling.

See what an hour finds.

We ran the benchmark.
KLUE sat above the curve.