Inside a 61 Minute Pentest: How KLUE Found 11 Critical Flaws in a Public Sector Web Application

NOTE

Disclaimer: All identifying details in this write up have been removed or abstracted to protect the client. Technology stacks, endpoint paths, parameter names, and data fields have been generalized. No proof of concept payloads that point at the real environment are included. The intent of this article is to share methodology and findings, not to point a finger at a specific application.

Something Was Off in the First Five Minutes

Most web application tests start slow. You pull the sitemap, eyeball the login flow, poke a few forms, and spend the first hour or two just getting oriented. That was not the shape of this engagement. Five minutes into the run, our autonomous pentesting agent KLUE had already flagged more useful attack surface than many human testers find in a full day. By the time the sixty first minute hit, eleven confirmed vulnerabilities were sitting in the report queue, including one that walked straight from an unauthenticated GET request to full database compromise.

This post is a case study of that run. It is written in the style a bug bounty hunter would use for a disclosure write up, because that is ultimately what it is: a sequence of observations, hunches, and confirmations that built into a chain. The only difference is that the operator here was an AI agent, not a human. Every finding was validated with a working proof of concept; every chain was reasoned out end to end.

The Target

The target was an internet facing application serving a public sector workflow. Think of it as the kind of portal where a citizen lodges a request, uploads a document, checks a status, and comes back a week later for an outcome. Boring on the surface. Very sensitive underneath.

A few shape cues without giving the target away:

A modern single page application front end, talking to a REST style API behind a generic host name.
A user base measured in the high five figures, with records that included names, identification numbers, and workflow state tied to real people.
Standard scope: unauthenticated testing, authenticated testing with a low privilege account we created through normal signup, and a named list of in scope hostnames.
No pre provided inventory of endpoints. KLUE had to find everything on its own.

That last point matters. A lot of "AI pentesting" demos today are really just LLM wrappers around a Swagger file. Give it the API schema, watch it roll through OWASP Top 10, and call it a day. Real engagements rarely look like that. In this case the client was not able to share a schema, so the agent had to behave like a human tester: map the attack surface from the outside, decide what to probe, and follow leads.

Minute by Minute: How the Run Played Out

The full run clocked in at 61 minutes. Below is a compressed timeline of what the agent actually did, grouped into phases.

Minutes 1 to 10: Reconnaissance

KLUE started with surface mapping. The early log lines are not glamorous but they set up everything else:

[KLUE] Identifying technology stack from response headers and asset fingerprints
[KLUE] Enumerating client side bundles and extracting embedded routes
[KLUE] Discovering API endpoints from JavaScript sources
[KLUE] Resolving cross subdomain assets in scope

A few things the agent pulled out of the bundles that would matter later:

A cluster of API routes that were referenced from JavaScript but not linked anywhere in the rendered UI.
A handful of configuration values embedded directly in the client, including what looked like bearer tokens.
A permissive set of CORS response headers on the API host.
A signup flow that posted a full user object to the backend on account creation.

None of this is an exploit yet. It is just a map. But every interesting bug later in the run was seeded here.

Minutes 11 to 25: Hypothesis Driven Probing

With the surface mapped, KLUE started forming and testing hypotheses. This is the phase where experience matters, because volume alone does not find bugs. You have to know where to look.

The agent focused on four families of tests in parallel:

Authentication boundaries. Which endpoints allow anonymous access? Which ones silently accept unauthenticated requests when a header is missing?
Authorization controls. Does the server trust client supplied identifiers? Are admin fields writable from unprivileged contexts?
Input handling. Where does user input reach a database, a templating engine, or a file system path?
Security configuration. CORS, cookies, security headers, transport settings.

Within this window KLUE started landing hits. A data lookup endpoint accepted a query string parameter and responded with a slightly different timing profile when a single quote was appended to the value. That was the thread that eventually led to the critical finding.

Minutes 26 to 45: Exploitation and Validation

Autonomous tools generate too many false positives when they stop at "this looks suspicious." KLUE is built not to. In this phase every candidate was pushed until the agent either confirmed impact or ruled it out.

[KLUE] Confirming blind time based injection with statistically significant deltas
[KLUE] Verifying database user privileges via version and user primitives
[KLUE] Testing writable high privilege fields on account creation
[KLUE] Demonstrating CORS credential leak with a controlled origin

Each confirmation generated an artifact: raw request, raw response, timing data, environmental state. Nothing ambiguous, nothing based on banner matching or signature alone.

Minutes 46 to 61: Evidence and Report

The last phase was documentation. KLUE packaged each finding with a reproducible proof of concept, a blast radius description, a CVSS vector, and a remediation plan keyed to the actual technology stack it had observed. By the end of the hour, the run was done.

The Score

Eleven findings, distributed like this:

1 Critical at CVSS 9.8
5 High in the 7.5 to 8.1 range
4 Medium in the 4.3 to 5.3 range
1 Low at 2.7

Worth pausing on that distribution. It is not top heavy by accident. The critical finding was not a one off lucky grep; it was the endpoint of a small chain that had earlier Highs feeding into it. Hardcoded credentials, a loose CORS policy, and a writable privilege field all sat within reach of the same attacker who could already read the database outright. The whole is worse than any individual part.

The Crown Jewel: Blind SQL Injection on an Unauthenticated Endpoint

Every write up has one bug that deserves the long form treatment. This one earned it.

Where It Lived

The vulnerable endpoint was a GET request that took a user controlled lookup parameter. It was reachable without authentication and, in line with the application's CORS policy, reachable from arbitrary origins. The response was a small JSON payload that, at first glance, looked like a cache friendly data endpoint. Nothing about the response body hinted at what was underneath.

The moment of interest came from a timing differential. With a benign value the endpoint replied in the low hundreds of milliseconds. With a single quote appended, the response was faster, returning an empty result. With a carefully crafted boolean payload, the response time changed in a way that tracked the payload logic. That is the signature of a classic blind injection, not a guess.

Confirming Impact, Not Just Presence

Plenty of scanners will stop at "the timing looks funny." KLUE does not. The agent established a baseline timing distribution for the endpoint across dozens of requests, then used conditional sleeps to confirm that the back end was indeed executing attacker controlled SQL. From there it enumerated the environment:

The database user the query was running under.
Whether that user had superuser privileges.
Whether the database had file system read or write capabilities exposed to it.
Whether operating system level command execution was reachable from within the database process.

All four of those came back affirmative. The application was not talking to the database as a carefully scoped service account. It was running as a superuser with file system reach. In practice that meant the injection was not a data read bug; it was a host compromise waiting to happen.

What the Chain Actually Looks Like

Put the pieces together and the exploitation path reads like this:

Anonymous attacker sends a crafted GET request to the lookup endpoint.
The parameter is concatenated directly into a SQL query on the backend.
The database user has enough privilege to read any table, including authentication and workflow tables.
The same user has file system access, which enables arbitrary file read and, with the right primitives, code execution on the database host.
Because the CORS policy reflects arbitrary origins with credentials, a victim's browser could also be weaponized to reach internal only variants of the endpoint if any are exposed through session trust.

That last point is why CORS always deserves attention. A misconfigured origin reflector turns what should be a server side only bug into something that can ride on a victim's session.

The Root Cause in One Sentence

String concatenation for query building plus a database role that was never scoped down. Neither issue is exotic. Both are preventable with defaults that every modern framework ships with.

The Supporting Cast

The critical was the headline, but the Highs are the ones that tend to bite quietly in production. A short tour.

Hardcoded Credentials in the Client Bundle

The SPA shipped with authentication material baked into its JavaScript. Anyone who opened the browser devtools could see them. The tokens were not scoped to a specific user; they were a broad service credential used for a backend integration that the front end proxied.

In practice this meant:

No signup or login was required to call the protected service.
The tokens could not be rotated without shipping a new build of the application.
Anyone who ever viewed the site, including search engine crawlers and archive services, had a copy of them.

This is the kind of finding that looks innocuous in an isolated report ("low severity, internal service"), but when you combine it with the CORS misconfiguration and the SQL injection below, it becomes a second way into the same castle.

Permissive CORS With Credentials

The API reflected the Origin header back as Access-Control-Allow-Origin and also set Access-Control-Allow-Credentials: true. That is one of the more dangerous CORS configurations you can ship. It means any site a victim visits can issue authenticated requests to the target and read the responses from within the attacker's page.

The failure mode here is subtle. The server is not "open to the world" in the classic sense. It is open to any origin the browser chooses to trust, which in a credentialed flow is effectively the same thing. A phishing page, a compromised ad, or a typosquatted domain becomes a data exfiltration channel against authenticated users.

Mass Assignment on Account Creation

The signup endpoint accepted a JSON body representing a new user. Somewhere in that object, buried among innocuous fields like display name and locale, sat an isAdmin equivalent. It was not documented in the UI. The front end never sent it. But the backend deserialized the entire object onto the user model and persisted whatever fields it found.

The exploitation is trivial: sign up the way a normal user would, but add a single extra field. The agent confirmed the escalation end to end by creating an account with elevated flags set, then using that account to reach administrative endpoints that were supposed to require a full provisioning ceremony.

The Rest

The remaining mediums and the low were the usual supporting cast: a missing security header here, an information disclosure in an error path there, a rate limit that was not quite a rate limit. None of them would warrant a standalone blog post. All of them belong in a report.

What an AI Agent Sees That Scanners Miss

At this point a fair question is, "how is this different from a commercial web scanner?" The short answer is that a scanner tests for patterns. KLUE tests for chains.

Three concrete examples from this run.

1. The Agent Reads the Client Like a Human Would

When KLUE extracted JavaScript bundles, it did not just grep for apikey and move on. It parsed route tables, followed conditional import paths, and matched the client side auth model against observed server responses. The hardcoded credential finding came not from a regex hit, but from noticing that a specific token was always attached to a specific cross origin request.

2. Findings Get Composed, Not Just Listed

A traditional tool would report the CORS misconfiguration and the hardcoded token as two separate items, each with its own severity. KLUE reasoned about them together. The write up explicitly notes that the token is reachable from an attacker origin because of the CORS policy, which raises the practical severity of both findings. That kind of compositional reasoning is what turns a shelf of mediums into a single actionable critical.

3. The Agent Earns Its Exploits

Every confirmation in this run has an artifact attached. The blind SQL injection was not reported until a controlled, mathematically distinguishable timing signal was recorded. The mass assignment was not reported until an elevated account was actually created and used to access a protected endpoint. The CORS issue was not reported until a credentialed cross origin read was demonstrated from a controlled test origin.

That discipline is also what keeps the report short. Out of many dozens of candidate leads the agent generated during the run, only eleven survived validation. The rest were ruled out and never reached the client.

Sixty One Minutes: Why the Clock Matters

Speed in security work gets a bad reputation because it is often a proxy for shallowness. A fast scan that says "we checked" is worse than no scan at all, because it invites false confidence. That is not the argument here.

The argument is that when you can complete a rigorous, chain aware assessment in an hour, you can run it more than once. You can run it after every deploy. You can run it before release. You can run it across a portfolio of applications on a quarterly rhythm without negotiating for budget each time. The unit economics change, and so does the security posture you can actually maintain.

For this client, the 61 minute number is not the interesting one. The interesting number is how often that run can be repeated without anyone blocking on it. That is where continuous autonomous testing starts to matter.

Remediation: What Went Out With the Report

The client received a report with per finding remediation tied to their actual stack. A condensed version of the top advice, because the patterns repeat across most of the clients we see:

For the SQL injection:

Parameterize every query that touches user input. No exceptions, no string concatenation, no template helpers that paper over the same unsafe behavior.
Scope the database user the application runs as. Remove superuser privilege. Remove file system reach. The application almost never needs either.
Add database level observability so that an unexpected query shape can be detected in production.

For the hardcoded credentials:

Move the integration server side. Anything that needs a secret should not be callable from the browser.
Rotate the tokens that were found. Assume they are already public, because they effectively are.
Add a build time check that flags strings resembling tokens in the client bundle before it ships.

For the CORS policy:

Maintain an explicit allow list of origins. Never reflect arbitrary origins in a credentialed context.
If an endpoint must be reachable cross origin, confirm that its authentication model can survive being called from an untrusted page. In most cases the answer is that it cannot, and the endpoint should not be credentialed cross origin at all.

For the mass assignment:

Define explicit DTOs for every write endpoint. Reject unknown fields.
Treat privilege bearing fields as server side only. They should never be accepted from the client, regardless of current role.

Takeaways

A few things worth pulling out of this run for anyone running an AppSec program:

If your pentest cycle is annual, most of these findings would have lived in production for months. The speed of the assessment is a security control on its own.
Findings are more dangerous in combination than in isolation. Any report that does not reason about chains is underselling the risk.
Exotic bugs are rare. The boring classes keep winning: string concatenated SQL, permissive CORS, hardcoded secrets, mass assignment. Every pentest industry survey has said this for a decade, and every new engagement confirms it.
Validation is the difference between noise and signal. A report full of "possible" and "potentially" is not a report. It is a backlog of work for someone else.

Closing Thought

KLUE is not magic. It is not replacing human judgment on the hardest bugs, and it is not claiming to. What it is doing, run after run, is collapsing the distance between "we have a new feature live" and "we know whether it introduced anything dangerous." Sixty one minutes on this engagement is one data point on a curve that is still bending.

If you want to see what an autonomous run looks like against your own stack, this is what it looks like. Unvarnished, end to end, with the clock running.

KLUE is part of Shellvoide's Penetration Testing as a Service (PTaaS) platform, delivering autonomous security assessments powered by AI. For a scoped engagement against your own environment, or to see a full sample report from a run like this one, reach out at info@shellvoide.com.