VOOZH about

URL: https://thenewstack.io/claude-mythos-preview-simulation/

⇱ Claude Mythos Preview completes full cyberattack simulation for the first time - The New Stack


TNS
SUBSCRIBE
Join our community of software engineering leaders and aspirational developers. Always stay in-the-know by getting the most important news and exclusive content delivered fresh to your inbox to learn more about at-scale software development.
REQUIRED
It seems that you've previously unsubscribed from our newsletter in the past. Click the button below to open the re-subscribe form in a new tab. When you're done, simply close that tab and continue with this form to complete your subscription.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.
Welcome and thank you for joining The New Stack community!
Please answer a few simple questions to help us deliver the news and resources you are interested in.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Great to meet you!
Tell us a bit about your job so we can cover the topics you find most relevant.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Welcome!

We’re so glad you’re here. You can expect all the best TNS content to arrive Monday through Friday to keep you on top of the news and at the top of your game.

What’s next?

Check your inbox for a confirmation email where you can adjust your preferences and even join additional groups.

Follow TNS on your favorite social media networks.

Become a TNS follower on LinkedIn.

Check out the latest featured and trending stories while you wait for your first TNS newsletter.

PREV
1 of 2
NEXT
VOXPOP
As a JavaScript developer, what non-React tools do you use most often?
Angular
0%
Astro
0%
Svelte
0%
Vue.js
0%
Other
0%
I only use React
0%
I don't use JavaScript
0%
Thanks for your opinion! Subscribe below to get the final results, published exclusively in our TNS Update newsletter:
NEW! Try Stackie AI
From clobbered drafts to real-time sync
Apr 14th 2026 10:00am, by David Moore
TypeScript 6.0 RC arrives as a bridge to a faster future
Mar 14th 2026 9:00am, by Darryl K. Taft
Mastra empowers web devs to build AI agents in TypeScript
Jan 28th 2026 11:00am, by Loraine Lawson
2026-04-14 14:35:03
Claude Mythos Preview completes full cyberattack simulation for the first time
AI Models / Emerging technologies / Security

Claude Mythos Preview completes full cyberattack simulation for the first time

An ASI evaluation shows Anthropic's Claude Mythos Preview can execute multi-stage cyberattacks and solve expert-level CTF challenges.
Apr 14th, 2026 2:35pm by Meredith Shubel
👁 Featued image for: Claude Mythos Preview completes full cyberattack simulation for the first time
Source: Getty Images for Unsplash+

The UK-based AI Security Institute (ASI) this week released the results of its evaluation of Anthropic’s new Claude Mythos Preview, and the model, released just last week, is unlike anything that’s come before. 

The evaluation, intended to benchmark the model’s cybersecurity capabilities, reveals that Claude Mythos Preview has shown marked improvement in capture-the-flag (CTF) and multi-step cyberattack simulations. 

Claude Mythos Preview has shown marked improvement in capture-the-flag (CTF) and multi-step cyberattack simulations. 

Claude Mythos Preview, in the hands of bad actors, could be used to carry out autonomous multi-stage attacks on vulnerable systems. 

While the results can’t concretely say how the model would perform in real-world environments, they offer a warning: Claude Mythos Preview could be used to carry out autonomous multi-stage attacks on vulnerable systems. 

Claude Mythos Preview: Too hot to handle?

While Anthropic launched Claude Mythos Preview on April 7, the AI giant didn’t hand over access to just anyone. Only big-name players (i.e., Amazon, Apple, Broadcom, Cisco, CrowdStrike, the Linux Foundation, Microsoft, and Palo Alto Networks, along with about 40 other organizations) were given seats at the table via Project Glasswing, a new initiative Anthropic describes as “an effort to secure the world’s most critical software.” 

Why did these teams get special access? 

It seems Anthropic considers Claude Mythos Preview too powerful for public release, at least right now. 

In a run of bad luck for Anthropic last week, an unsecured, publicly accessible data storage was leaked, revealing that the AI company was working on a new model called Mythos — “the most capable [model] we’ve built to date,” an Anthropic spokesperson told Fortune.

Now, the evaluation by ASI — a body run by the UK government’s Department for Science, Innovation and Technology — seems to give credence to that claim, stating in its announcement blog post that “our results show that Mythos Preview represents a step up over previous frontier models.”

The first AI model to autonomously execute a 32-step corporate network takeover

ASI conducted a series of controlled evaluations, giving Claude Mythos Preview explicit directions and access to discover and exploit vulnerabilities, enabling it to execute multi-stage attacks on vulnerable networks.

Carrying out these kinds of attacks requires chaining together dozens of hosts and network segments. It’s an arduous process that can take human hands anywhere from hours to weeks to accomplish — and one bad actor would likely tap Claude Mythos Preview to tackle if and when they get their hands on the model. 

To measure its ability to do so, the evaluation included “The Last Ones” (TLO), a 32-step corporate network simulation covering reconnaissance through full network takeover, which takes about 20 hours of human bruntwork, ASI estimates. 

Claude Mythos Preview got the job done — and is the first model to do so. 

It succeeded in solving the TLO from start to finish in three out of its 10 attempts. Across all 10 attempts, the model completed, on average, 22 out of all 32 steps.

Claude Mythos Preview’s performance is leaps and bounds ahead of the previous reigning champion, Claude Opus 4.6, the next best-performing model, which only completed 16 out of the 32 steps, on average.

It completed expert-level tasks 73% of the time

The TLO simulation wasn’t the only test ASI put Anthropic’s model through. 

Claude Mythos Preview also included CTF challenges, in which the model must identify and exploit system weaknesses to retrieve hidden “flags.” 

Again, the new model outranked existing models. Particularly noteworthy is its performance on expert-level tasks: Claude Mythos Preview succeeded 73% of the time.

No other model could even complete these tasks before April 2025.

What the results do — and don’t — mean

While ASI’s evaluation certainly reveals stunning results about Claude Mythos Preview’s cybersecurity capabilities, it doesn’t paint a crystal-clear picture of what could happen in the real world. 

Yes, the results show the model is capable of autonomously attacking systems — but ASI points out that there are differences between its evaluation and real-world environments. 

“We cannot say for sure whether Mythos Preview would be able to attack well-defended systems.”

For one, ASI clarifies that its results mean Claude Mythos Preview can autonomously attack “small, weakly defended and vulnerable enterprise systems where access to a network has been gained.” 

The body notes that real-world systems likely have security features in place, like active defenders or defensive tooling. Plus, in the real world, the model would likely trigger certain security alerts, another factor not accounted for in ASI’s tests. 

And Claude Mythos Preview didn’t ace everything. The model was stumped by IT sections in the operational technology-focused cyber range, “Cooling Tower.”

There’s no ignoring that what Claude Mythos Preview did accomplish in ASI’s evaluation is unprecedented — and its capabilities will surely only evolve as other models also advance. 

But even as its evaluation underscores the growing cybersecurity threats AI models pose, ASI also issues a disclaimer: “We cannot say for sure whether Mythos Preview would be able to attack well-defended systems.”

TRENDING STORIES
Meredith Shubel is a technical writer covering cloud infrastructure and enterprise software. She has contributed to The New Stack since 2022, profiling startups and exploring how organizations adopt emerging technologies. Beyond The New Stack, she ghostwrites white papers, executive bylines,...
Read more from Meredith Shubel
SHARE THIS STORY
TRENDING STORIES
TNS owner Insight Partners is an investor in: Anthropic.
SHARE THIS STORY
TRENDING STORIES
TNS DAILY NEWSLETTER Receive a free roundup of the most recent TNS articles in your inbox each day.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.