How Akeyless Turned AI Agents Into Security Researchers

May 11, 2026
Posted by Or Amar

Every finding becomes a skill. Every skill runs on every pull request. The system never stops learning.

At Akeyless, our platform is the thing attackers want most, the system that stores and manages every other system’s credentials. A vulnerability here isn’t a data leak. It’s a skeleton key.

So when we integrated AI into our security program, we didn’t build a scanner. We built a system that compounds knowledge. This post breaks down the full methodology – how the agent learns to think about security, how it reviews and tests every pull request (PR) automatically, and why it gets smarter with every line of code we ship.

When an investigation wraps up — say, discovering that a token rotation window allows two credentials to coexist – the result isn’t just a report. The investigative logic gets extracted: what was looked for, what conditions triggered the finding, what context mattered, and how to fix it. That extraction becomes a skill, a structured set of instructions written in natural language that the agent can interpret and apply independently.

The skill gets added to the agent’s persistent context. From that point on, every time the agent reviews a PR, it loads its full skill set alongside the code diff. It doesn’t pattern-match against a regex database. It reasons over the code using the accumulated investigative logic of every past review. The same way a senior security mind would, except the agent never forgets a single lesson.

The more investigations that run, the more skills the agent carries. The more skills it carries, the more it catches on pull requests. The more it catches on PRs, the more new patterns surface for the next investigation. Twelve months of this loop, and the agent reviews every pull request with the institutional knowledge of every security investigation ever conducted against the platform.

This is how we built it.

The Methodology: How the Agent Learns to Think

Before the agent earns skills, it learns how to reason about security. The training is modeled on how the best security work actually happens when approaching an unfamiliar feature. Here’s the framework.

Understand first. The agent builds a complete mental model of the feature before it looks for problems. For an authentication method like Universal Identity, our solution to the machine-identity bootstrapping problem, that means understanding the token lifecycle, the rotation model, the trust anchor, and who the intended consumer is. Every design decision is a tradeoff. Every tradeoff is a surface.

Compare everything. A single auth method in isolation tells you nothing. The agent maps it against every other method we support: SAML, OIDC, Kubernetes, AWS IAM, certificates, and more. Not to review them all, but to surface the right questions. For instance: Kubernetes auth tokens never leave the customer’s environment. Universal Identity tokens cross to our SaaS control plane. Why? What are the implications? That question only exists because the agent compared.

Map the flow end-to-end. Every request, every response, every header, every error code. The agent captures the full interaction and flags what matters: responses that return more data than the caller needs, error messages that reveal internal details an attacker could use, and transitions between states that leave a brief window where the system is vulnerable. It also examines every input field, including what data types are expected, what happens when unexpected types are sent, and whether any input reaches backend systems without proper sanitization, opening the door to injection attacks.

Build the threat model. The agent doesn’t think about abstract risks. It picks a specific attacker — say, a compromised CI/CD pipeline that has a valid UID token — and walks through the system step by step, as that attacker. Can this token reach the admin API? Is there a permission check, or does the endpoint just trust any valid token? If there’s a permission check, does it verify the token’s role or just its signature? The agent follows the actual request path through the Gateway, the control plane, the RBAC engine, and at every hop it asks: is this enforced in code, or just assumed?

Challenge the code. Now the agent takes everything it learned and holds the implementation accountable. Where it finds a gap between what the feature is supposed to guarantee and what the code actually does. And it checks the reverse: that enforcing security doesn’t break the feature itself. A permission check that blocks attackers but also blocks legitimate users isn’t a fix, it’s a new bug.

The Pull Request Pipeline: Review, Test, Re-Test, Final

Every skill the agent learns doesn’t sit in a document. It runs. On every pull request. Automatically.

When a developer opens a PR that touches authentication, secrets handling, or any security-relevant path, the full pipeline activates:

Stage 1: Full Review. The agent maps the changes to the auth flows and trust boundaries they affect, applies every accumulated skill against the diff, and posts findings directly to the PR. Each finding has a severity, a concrete attack scenario, and an inline code fix. Not “consider validating this input.” An actual fix, in the right file, at the right line, that the developer can review and apply. We learned early on that vague findings get ignored. Specific findings with working fixes get merged.

Stage 2: Full Test. The agent doesn’t just flag issues, it builds test cases to prove them. If the review finds that an endpoint doesn’t check whether the caller is authorized to perform the action, the agent generates a test: authenticate as User A, attempt the action as User B, verify the request is rejected. If the review found that an error response leaks internal service names, the agent generates a test: send an invalid request, capture the response, verify it contains no internal metadata. Every finding gets a test. Every test gets posted to the pull request with a pass/fail result.

Stage 3: Re-Test. After the developer pushes fixes, the agent doesn’t just re-run the same checks. It does three things. First, it verifies each original finding is actually fixed – not just that the code changed, but that the test case from Stage 2 now passes. Second, it checks whether the fix broke something nearby. For example, if the developer added a new validation function, the agent checks: does that function get called correctly everywhere it’s now used? Did adding it change the response format in a way that breaks other callers? Third, it re-applies all relevant skills against the updated diff. Because the fix itself is new code, and new code can have its own issues.

Stage 4: Final Test. The merge gate. The agent steps back from the individual findings and looks at the big picture: after this pull request merges, is the system still secure end-to-end? It tests full authentication flows — not just the endpoints this PR touched, but the complete path from token creation to secret retrieval. This matters because pull requests don’t ship in isolation. Maybe last week a different PR added a caching layer for auth responses. This week’s PR changes how tokens are validated. Individually both are fine. But if the cache is now serving auth responses that were validated under the old logic, the system has a gap. The final test catches this because it tests the actual combined state of the code, not just the diff.

Pass → the agent signs off. Fail → the merge is blocked with a clear explanation.

No handoffs. No waiting for a review cycle. The feedback is on the pull request, in minutes.

The Snowball Effect: Findings Become Skills, Skills Become Automated Checks

When an investigation wraps up, the report isn’t the end product. The investigative logic gets extracted — the specific pattern that was recognized, the conditions that were checked, the reasoning that led to the finding — and encoded as a discrete skill the agent can execute independently.

Concretely, a skill is a set of instructions: what to look for, where to look for it, what context matters, what constitutes a finding, and how to fix it. It’s not a regex or a static rule. It’s an investigative thought process, packaged so the agent can replay it against any code change that touches the relevant area.

To show how this works, here’s a simplified example. This is not a real finding, but representative of how the process plays out.

Suppose a deep investigation into one of our authentication flows reveals a pattern: the system checks whether a token’s signature is valid, and if it is, trusts everything inside. Think of it like a VIP pass at an NBA game. Security checks the hologram, scans the barcode, confirms it’s real, but nobody checks the name on it. Anyone holding a valid pass walks into the locker room, even if it’s not theirs. The finding gets fixed.

The investigative logic gets extracted into a skill: “When reviewing any code that validates authentication tokens, check what happens after signature validation passes. Verify that the code doesn’t stop there, it must also confirm that the token’s claims match the request being made. Who is this token for? What permissions does it grant? Is it being used within its intended scope? A valid signature alone is not authorization.”

That skill now runs on every pull request. It doesn’t just prevent the same issue from reappearing, it catches the same class of thinking error across completely different features. A PR that modifies how session tokens are verified. A PR that changes how API keys are validated against permission sets. Different code, different auth method, same underlying pattern. The skill flags it because the agent was taught what to think about, not just what to grep for.

That’s the loop: Investigation > Finding > Extract the thinking > Encode as skill > Skill runs on all future PRs > New patterns surface > Investigation

Why This Is Specific to Akeyless

This isn’t a generic playbook. Three things about our platform shape every part of this system:

Distributed trust is the product. Most systems store encryption keys in one place. If that place is compromised, everything is exposed. Akeyless doesn’t work that way. Keys are split into fragments held by separate, independent parties. No single party, including Akeyless, ever holds a complete key. Our agent is trained to reason about this model, because a security flaw in a distributed trust system looks completely different from a flaw in a traditional vault.

The Gateway is a trust boundary. Our Gateway runs in the customer’s environment. For some auth methods, credentials never leave that environment. For others, they cross to our SaaS plane. The agent tracks which trust model applies to which method, per request. A skill learned from Kubernetes auth — “credentials must not cross the customer boundary” — will automatically flag a regression if a future PR accidentally routes K8s tokens to the SaaS control plane.

Blast radius is everything. A vulnerability in a typical application leaks that application’s data. A vulnerability in Akeyless could leak the credentials that protect every other system in an organization. Every skill’s severity assessment is weighted by downstream impact. Not just “can this be exploited,” but “what does the attacker reach through a compromised Akeyless credential?”

What This Gives Us, And What It Doesn’t

What we get: Every pull request reviewed with the full institutional knowledge built up over every past investigation. Findings with severity, attack scenario, and working code fix — posted automatically. A four-stage pipeline with no human bottleneck. A skill set that compounds with every investigation.

What this doesn’t replace: Human judgment on novel attack classes. Runtime testing on live environments. The adversarial creativity that finds the thing nobody was looking for.

The agent handles the known-knowns and surfaces the known-unknowns. The human side focuses on the unknown-unknowns, the work that actually requires creative, adversarial thinking about a system that’s deeply understood.

A Security Program That Learns

Most security programs scale linearly — more code means more reviewers. Ours scales exponentially. The human side stays focused where it’s irreplaceable, probing the system with the kind of adversarial creativity that can’t be codified. When something new is found, the thinking behind it gets captured as a skill, and the agent makes sure that problem never comes back.

We didn’t automate security reviews. We built a security program that never stops learning.

Table of Contents

How We Turned AI Agents Into Security Researchers at Akeyless