Investment Notes

Autonomous Red-Teaming at Scale: The Hadrian Investment Note

Martijn Hoekstra

When we led Hadrian's seed round in 2022, the automated offensive security testing market was at an inflection point. The category label "automated penetration testing" had been used for years to describe tools that were, in practice, automated vulnerability scanners with light exploitation capability — Nessus and its successors. What Hadrian was building was categorically different: a platform that could perform multi-step attack path validation, chaining together reconnaissance, exploitation, and post-exploitation reasoning to identify which combinations of discovered weaknesses were actually exploitable in the target's specific environment. This distinction — between enumerating vulnerabilities and validating exploitable attack paths — is the difference between knowing that a system has unlocked doors and knowing which sequence of doors, traversed in the right order, leads to the server room.

The technical challenge that makes autonomous offensive security hard is the same challenge that makes red team operations expensive: it requires contextual reasoning about an environment that the tool has not seen before. A Metasploit module fires against a known service version on a specific port. An autonomous offensive platform needs to reason about its discovered environment: which service versions are running, what trust relationships exist between components, what credentials or tokens found during one step of reconnaissance can be applied in subsequent steps. Chaining exploit primitives together into an end-to-end attack path requires understanding the attack surface as a graph rather than as a list of independent findings — and evaluating which paths through that graph represent realistic adversarial objectives, not just theoretical reachability.

Our diligence on Hadrian centred on three technical validation questions. First: how does the platform handle scope boundaries in a way that prevents legitimate security testing tooling from causing operational disruption? This is a practical concern that matters enormously to enterprise buyers — a tool that takes down a production service during a scheduled security assessment is a catastrophic failure from an operational perspective. Second: how does the platform model the attacker's decision-making about which paths to pursue, rather than simply exhaustively testing all reachable paths? Attack path selection that mirrors realistic adversary prioritisation produces findings that are operationally meaningful, not just technically valid. Third: how does the platform generate findings that give remediation teams enough context to actually fix the problem — including the specific chain of conditions that made the attack path viable — rather than just reporting a terminal vulnerability?

We are not arguing that autonomous offensive testing replaces human red teams. A skilled human operator brings creative lateral thinking, social engineering capability, and the ability to improvise against unexpected defensive responses that no automated platform currently replicates. What we are saying is that the use case is different: an autonomous platform running continuous assessments against your external and internal attack surface gives you a signal that is temporally dense — finding the configuration drift, the new service deployment, the newly exposed API endpoint — in a way that a quarterly red team engagement cannot. The two are complementary. The human red team is the gold standard for testing your security posture under conditions a sophisticated adversary would actually use. The continuous automated platform is the monitoring layer that catches the ongoing state of your attack surface between those engagements.

Hadrian's market positioning within the European security ecosystem also factored into our conviction. European enterprises face a specific compliance landscape — NIS2, DORA for financial institutions, sector-specific requirements for critical infrastructure operators — that is increasingly requiring documented evidence of regular security testing and validated attack surface management. A platform that generates audit-grade findings documentation as a natural output of its testing cycles, rather than requiring manual report-writing after the fact, addresses a genuine operational need that is only going to intensify as regulatory enforcement under NIS2 becomes more active. The compliance tailwind is not the primary thesis — the technical differentiation is — but it creates a purchasing justification in regulated sectors that accelerates adoption.