Researchers use BYO LLM to build self-replicating AI worms

Rather than relying on fixed, specific exploits, a team of researchers at the University of Toronto in Canada created a worm, a self-replicating malware that penetrates networks, devising new attack strategies for each machine it encounters.

In addition, the CleverHans Lab team, led by Associate Professor Nicolas Papernot, used a small, free large-scale language model (LLM) for the worm, showing that it does not require large commercial infrastructure to run.

The worm carries a copy of a single graphical processing unit (GPU) openweight LLM, which the malware executes on already compromised machines.

Each newly compromised host provides additional computing resources while providing a foothold for the malware.

This allows the worm to survive by parasitizing the victim’s infrastructure.

Devices that cannot host the model themselves, such as low-resource Internet of Things (IoT) sensors, instead forward inference queries to nodes with infected GPUs.

Papernot’s team and researchers from the Vector Institute and the University of Cambridge tested the worm in an isolated 33-host virtual environment with Linux servers, Windows machines, and IoT devices. [preprint].

These were configured using common vulnerabilities found in corporate environments, such as reused passwords and unpatched software.

Across 15 independent 7-day runs, the worm prototype correctly identified an average of 31.3 vulnerabilities per trial.

We were able to scale access to 23.1 hosts, which now extends to 20.4 hosts, almost two-thirds of the test network.

Individual exploitation attempts were successful in 44% of cases, with most failures caused by malformed payloads rather than flawed attack strategies.

The worm’s performance was worst against web application structures, Windows command environments, and tasks requiring precise string manipulation, but the team attributed this to the code generation limits of current-generation single-GPU models rather than a fundamental design flaw.

Researchers say the cap is temporary.

“These reflect the code generation limits of current generation single-GPU models, rather than fundamental constraints on the approach, and we expect those limits to narrow as the code generation and structured output of language models improves,” they write.

Despite the trial-by-trial failure rate, the worm swarm architecture compensated by running multiple parallel independent inference trajectories simultaneously.

In testing, the prototype successfully exploited recent copy failures, dirty flags, and remote code execution flaws in Marimo by reading public security advisories at runtime and creating a working exploit solely from that information.

The worm was able to self-repair without human intervention.

When a bug in VM detection caused a replica to crash on Alpine Linux and Windows Server 2008 hosts, the parent worm located the attestation source file on the target machine, removed the failed check, and successfully retried.

AI safety control has no protection

Because the worm runs entirely on a locally hosted openweight model, commercial platform controls such as denial of service, content filtering, and rate limiting cannot protect against this type of attack.

The open-weight model’s safety guardrails can also be circumvented if an attacker takes control of the local execution environment, the researchers said.

“Traditional economic barriers to cybersecurity have collapsed,” the paper claims.

“The worm parasitically uses the victim’s own computational resources, reducing the attacker’s marginal cost to zero,” the researchers wrote.

Defense against worms, on the other hand, can be done through network microsegmentation, zero-trust architecture, and searching for detectable signatures, along with AI-assisted penetration testing and fuzzing to find exploitable weaknesses before an attacker can do the same. However, the latter is a proof-of-concept artifact.

The University of Toronto has not made the prototype available to the public and has established a review process that allows qualified researchers to request access for defensive purposes.

Not just AI worms

Prior to CleverHans Labs’ research, a joint team from Peking University, Sun Yat-sen University, Wuhan University, Tsinghua University, and Singapore Management University released ClawWorm in March this year.

ClawWorm demonstrated a self-replication attack against OpenClaw, an open source agent framework with over 40,000 active instances.

The virus achieves a completely autonomous infection cycle from a single message. It hijacks the victim’s core configuration to establish session persistence across reboots, executes the payload on every reboot, and propagates to all newly discovered peers without further interaction from the attacker.

“ClawWorm is the first fully autonomous self-replicating worm targeting a production-scale LLM agent ecosystem.

“It achieves persistent persistence, executes arbitrary payloads, and autonomously spreads to new agents, highlighting serious structural vulnerabilities in current agent architectures,” the researchers wrote.

The researchers said they achieved a total success rate of 64.5% in a controlled testbed attack evaluation across four LLM backends.

The researchers have published the ClawWorm project site on GitHub.

Source link