The asymmetry at the heart of AI security

AI is making the known problems trivial and the unknown ones more dangerous. The security professionals who thrive will be the ones who understand the difference.

Six months ago, a group of traders introduced themselves to the Drift protocol team at a crypto industry conference. They presented as credible, technically sophisticated, and well-capitalized, the exact type of partner an upstart DeFi platform would want.

Throughout the fall and winter, members of this group continued to deepen their relationship with Drift. They attended in-person meetings, deposited one million dollars of their own funds into the protocol, and shared code repositories showcasing trading strategies they were evaluating. They also used that time to silently compromise Drift signing machines, artificially inflate a token they controlled to a nominal value of $500 million, and forge several malicious transactions that would hand them administrative control of the protocol.

Then, on the evening of April 1st, they attacked.

These trading partners, who turned out to be a front for North Korea's Lazarus Group, cashed out. In total, they drained $280 million¹ of real funds, collateralized by their worthless token and the half year of real trust they'd built with the team.

During the same 6-month window, we've seen increasingly sophisticated offensive security capabilities demonstrated by frontier AI models. Claude Code can now find critical exploits in the Linux kernel. Its latest Mythos model is so powerful that no publicly accessible release is planned. Security expertise has become widely accessible, and we now live in a world where open source agents can run attack playbooks that were exclusively under the purview of elite teams just a few years ago.

One can easily envision a bleak future for the state of cybersecurity. Both for the industry and its human practitioners, who seem more irrelevant with each model release.

But there's hope.

The Frenchman's lament

To understand what AI can and cannot do in security, it helps to understand what it can and cannot do in general. François Chollet and the ARC Foundation recently released ARC-AGI-3², a set of challenges designed to benchmark the general reasoning capabilities of AI models. The puzzles look like visually stimulating games designed for schoolchildren. They come with no instructions, require on-the-fly heuristic acquisition, and heavily penalize brute-force approaches.

Because of this, an 8-year old child can solve them, and the 800-billion dollar frontier labs are hopelessly lost. More specifically, the research paper⁸ accompanying the ARC-AGI-3 release states that humans, on average, have a 100% pass rate on these games. ChatGPT has 0.26%.

100%

Average human pass rate on ARC-AGI-3

0.26%

ChatGPT pass rate on the same challenges

$280M

Drained by Lazarus Group via a novel social engineering playbook

On Twitter, in podcasts, and across research papers, Chollet repeatedly argues that the underperformance of models like ChatGPT, Claude, and Gemini is not incidental. It is inherent to the LLM architecture. While they seem to exhibit generalized problem solving capabilities, what we're seeing is high-dimensional pattern recall masquerading as intelligence. And this pattern recall approach, essentially sophisticated memorization of terabytes of training data, places a practical upper bound on how powerful these systems can ever become.

As a consequence, they fail catastrophically on classes of problems that are genuinely novel.

The implication for security is direct. AI models are brittle against out-of-distribution (OOD) problems; novel tasksets that do not map cleanly to their priors. And security, unlike solving math olympiad problems or writing performant code, is an OOD domain. It is an iterative, adversarial game, with an adaptive opponent. An opponent that, when sufficiently motivated, can always find a way to manufacture out-of-distribution problems.

The shape of the new threat

The Drift protocol attack is one data point.

Another: sophisticated deepfake images and videos are now being integrated into attackers' workflows to socially engineer victims. In 2024, a finance employee at a Hong Kong firm was manipulated into wiring $25 million to attackers⁹ after joining a video call where every other participant, including the apparent CFO, was an AI-generated deepfake. Late last month, the npm ecosystem's axios library was compromised in a supply chain attack that used similar AI-generated trust signals¹⁰ to gain access to a maintainer's credentials.

A third: In the fall of 2025, Anthropic disrupted a first-of-its-kind fully autonomous cyber-espionage campaign⁴ run by Chinese state actors. They targeted 30 high-profile western entities, including tech companies, financial institutions and government agencies. According to Anthropic's research, the attackers were able to "leverage AI to execute 80-90% of tactical operations independently at physically impossible request rates". Anthropic also notes that they gained full access to a handful of targets. It's important to take a look at the target list again and understand that this wasn't a successful exploit of a 50-person startup, or a social engineering campaign against an open-source developer. It was a small team compromising some of the world's most well defended institutions using a commercially available model and open source tooling.

The unsolved attack surface

AI itself presents a poorly understood attack surface.

A paper out of UCSB⁵ this month showed that a single compromised intermediary in the LLM stack can silently rewrite tool calls and extract sensitive data, effectively taking control of an agent's behavior. In real-world tests, attackers were able to inject malicious commands and extract credentials (including Ethereum private keys and AWS secrets) without any visibility at the model or application layer.

Enforcing traditional security best practices like least privilege and isolation on agents has proven difficult, and attack vectors like prompt injection and various poisoning methods remain unsolved. Crucially, these attacks emerge from entirely new vulnerability classes introduced by AI-native architectures. Classes that current models are not trained to recognize or defend against.

The asymmetry

An important asymmetry emerges here.

The implication of the ARC-AGI-3 results is that our AI systems by themselves are handicapped in effectively identifying and responding to OOD attacks.

On the other hand, attackers don't need AI systems to generalize, but to scale. A human adversary with a novel strategy, whether that's a six-month infiltration campaign, a supply chain attack timed across two release branches, or an autonomous espionage operation, can use AI to execute faster, cheaper, and at greater reach than was previously possible. AI has brought the "ambient cost of running an effective offensive operation down dramatically"³.

This imbalance provides a structural advantage to attackers. The same AI systems that are used to amplify novel strategies are not, on their own, reliable at recognizing or stopping them. Unless, of course, they're augmented with human collaborators, partners that do not collapse when faced with OOD events.

Adversaries can leverage AI systems to generate out-of-distribution attacks faster and cheaper than ever before. Autonomous defenses are structurally handicapped against them.

The humans who become more valuable

It's worth examining what that human-AI collaboration might look like.

Dan Guido, founder of Trail of Bits, recently spoke⁶ at the [un]prompted AI security conference on rebuilding his company from the bottom up to be an AI-native security consultancy. The stated goal was to "let humans and autonomous agents ship high-rigor work at dramatically higher throughput". The model should sound familiar.

The result: human auditors have gone from finding roughly 15 bugs a week to finding 200. They built 94 plugins, 201 skills, and 84 specialized agents, and they open-sourced most of the infrastructure.

Two things are worth noting about those numbers. First: they required a year of hard internal work, with 95% of the organization initially resistant, before they became achievable. Second, and more importantly: the AI surfaces candidates but still leverages human expertise to triage, evaluate, and judge what paths are and aren't worth following.

15→200

Bugs found per week, before and after AI-native restructuring at Trail of Bits

~20%

Of client-reported bugs now initially surfaced by AI, each validated by an auditor

Plugins built to encode 14 years of audit expertise into agent-consumable form

Another noteworthy statistic: 86%. The Trail of Bits team built a Claude skill for identifying cases of token unit mismatch in smart contract code. This is a particularly nasty class of bugs in Web3 products, as it's difficult to identify consistently, and has been the cause of at least $230 million in lost or drained funds. The skill leverages dimensional analysis, a mathematical technique typically taught to US students in grade 7, to outperform baseline LLM performance by 86%⁷.

This is the model for the security professional who survives the transition: not someone who competes with AI on catalogued vulnerability detection, but someone who encodes their expertise into systems that make AI effective, that validates what AI surfaces, and directs attention toward the class of problems AI cannot see.

The irony is that AI both makes human security expertise less necessary and more critical. Less necessary for the known, the prior observed, and the well-documented. More critical for everything outside that, which is where the most dangerous adversaries will operate.

We should expect the average system being built in the next few years to be significantly more secure than ever before. We should also expect the exploits that succeed to be more consequential than anything we've seen. The bar is raised, but so is the ceiling. And the gap between what is known and what is possible is precisely where security will continue to matter.

Sources & notes

Drift Protocol disclosure thread. The attackers inflated a worthless token to ~$500M in reported collateral, then borrowed real assets against it.x.com/DriftProtocol/status/2040611161121370409
Chollet, F., & ARC Foundation. ARC-AGI-3: Abstraction and Reasoning Corpus v3. March 2026.arcprize.org/arc-agi/3
ringmast4r, "We may be living through the most consequential moment in offensive security."ringmast4r.substack.com/p/we-may-be-living-through-the-most
Anthropic. Disrupting the first reported AI-orchestrated cyber-espionage campaign. 2025.anthropic.com/research
UCSB Computer Security Lab. Silent tool-call rewriting in the LLM stack. April 2026.arxiv.org/abs/2604.08407
Guido, D. Keynote at [un]prompted AI security conference.youtube.com/watch?v=kgwvAyF7qsA
Trail of Bits. Dimensional analysis Claude skill for token-unit mismatch detection.x.com/trailofbits/status/2036775300579672233
Chollet, F., et al. ARC-AGI-3 research paper. March 2026.arxiv.org/abs/2603.24621
CNN. Finance worker pays out $25 million after video call with deepfake "chief financial officer." February 2024.cnn.com/2024/02/04/asia/deepfake-cfo-scam-hong-kong-intl-hnk
GitHub. axios/axios supply-chain incident discussion. 2026.github.com/axios/axios/issues/10636

Let's discuss your next
security milestone

The asymmetry at the heart of AI security.

The Frenchman's lament

The shape of the new threat

The asymmetry

The humans who become more valuable

Let's discuss your next security milestone

The asymmetry at the heart of AI security.

The Frenchman's lament

The shape of the new threat

The asymmetry

The humans who become more valuable

New Praxis posts, delivered as they're published.

Let's discuss your next
security milestone