Autonomous Malware Is No Longer Theoretical: AI Worm Proof Of Concept Created In A Lab
On June 2, 2026, security researchers published a paper about the creation of an AI work. The headline is as subtle as a fire alarm: This lab experiment of a worm is no longer just code that blindly crawls across your environment; it leverages AI models and can now reason, execute, and learn in complete autonomy.
This is certainly not the first lab-created malware. We have seen experimental or proof-of-concept worms before, from Creeper and Reaper to the Xerox PARC worms, the Morris worm, and Cabir. But the Morris worm remains the cautionary tale with teeth. It was created by Robert Tappan Morris (released in 1988), and it rapidly became one of the internetwide disasters that helped make the need for security norms, regulations, and laws governing the internet perfectly clear.
This latest research opens a whole can of worms (pun intended). The researchers used publicly available open-weight AI models to demonstrate an autonomous worm capable of discovering and exploiting vulnerabilities while spreading laterally across the network. Thankfully, the work was conducted in an isolated lab environment and was disconnected from the internet.
The implications of this research finding become hard to ignore as security leaders prepare for handling autonomous threat operations. Experts must grasp the following:
- The economics of AI-enabled threat activity become more complex. Recent headlines around Anthropic’s Claude Mythos Preview and OpenAI’s Daybreak have caught the attention of security practitioners, business leaders, and governments. This research on AI works, however, pushes the conversation somewhere more uncomfortable as it expands the scope toward the use of small open-weight models for offensive security use cases. Additionally, in such attack vectors, the victim’s infrastructure becomes part of the adversary’s operating budget. While the design and code of this AI-enabled worm have been obfuscated for safety reasons, threat actors can leverage such low-cost designs to exploit at scale in a more “intelligent” approach. This does not imply that every threat actor now has a nation-state capability in a box. But it does mean defenders should stop assuming adversary TTPs will remain stable and repeatable. Security teams are already worrying about token spend, ROI, inference costs, and the operational complexity of using AI agents for defensive use cases. This economic asymmetry between defenders and attackers is brutal. To begin with, enterprises need to start measuring the performance of their defensive AI agents not just by token usage or cost, but by parameters such as failure rates, sub-agent effectiveness, task quality, and escalation frequency, to help optimize its effectiveness over time.
- AI-enabled exploitation is autonomous, but execution still takes time. In this lab setup, for the autonomous AI worm with no human in the loop, it still took about 7 days to complete each experimental run. The researchers conducted 15 independent experiments on an isolated 33-host network spanning Linux servers. On average, the proof-of-concept worm took 2,520 hours (15 independent runs * 7 days per run * 24 hours) to complete all experimental runs. With a success exploit rate of 73.8% in aggregate, this demonstrates that small open-source models are capable and for sophisticated attack that leverages smarter LLMs, the results could be more concerning to security leaders.Unlike discovery of vulnerabilities, exploitation can be tricky even with the use of AI models as in this experiment the success rates were 52% and 55% for exploiting CVEs and CWEs respectively. The key takeaway is that autonomy is not magic, as the AI-enabled worm experienced failed exploitation attempts that could be detected if modern security controls were used. However, rapid enhancements to compute, memory, AI models, and supporting hardware make behavioral-based detection harder with the reduced time frames.
- The threat-actor code of ethics doesn’t require nerfing an AI worm. The researchers intentionally avoided turning their prototype into operationally deployable malware. They refrained from adding evasive capabilities such as encryption, polymorphic code, persistence, forensic cleanup, stealthy traffic shaping, or log suppression. These rules do not apply to real-world threat actors.The more speculative concern is adversarial software diversity. Threat actors can leverage AI to create a proprietary programming language to perform such operations. This makes analysis and incident response hard to scale as you would require manual intervention to decipher such drastic changes. Skilled reverse engineers, behavioral telemetry, sandboxing, and memory analysis still matter. But if AI helps attackers generate unfamiliar tooling faster than defenders can classify it, human-led analysis becomes the bottleneck.
- Visibility and enforcement become nonnegotiable. In production, this lab variant would likely have been easy to detect because it moved slowly and lacked mature evasion techniques. However, a more sophisticated attack would change this equation. Controls such as microsegmentation become essential to contain propagation and limit lateral movement, while complementary tools such as deception technology can add an additional layer of defense. What makes this AI worm especially concerning is its use of local AI compute resources. So, the new “AI PC” on your employee’s desk can be misused infra, in the absence of right monitoring and response capabilities.The harder challenge sits inside the agentic application layer. Organizations need to know not only what assets they own, but which AI agents exist, what tools those agents can call, what identities they operate under, what data they can access, what permissions they inherit, and what runtime behaviors they generate. Hence, visibility into agent metadata, audit logs, tool-call telemetry, and AI bills of materials become part of the enterprise security baseline. Any agentic application that cannot be inventoried, monitored, or enforced must be terminated from the network. Additionally, enterprises must account for the fact that patching alone is not a viable solution as it falls apart when handling legacy systems. This places greater emphasis on deploying and optimizing compensating controls such as virtual patching where applicable, while strengthening your zero-trust approach to your overall security posture. Security teams should also prioritize the implementation of mature (in terms of sophistication of operations and reliability) AI agents for defensive use cases such as triage, investigation, and threat hunting.
Let’s Connect
Forrester clients who have questions about this topic or anything related to threat intelligence can book an inquiry or guidance session with me.