Using AI models to generate exploits of defects in cryptocurrency contracts is not necessarily legal, but it appears to be a promising business model.
Researchers at University College London (UCL) and the University of Sydney (USYD) in Australia have devised AI agents that can autonomously discover and utilize vulnerabilities in so-called smart contracts.
Smart contracts that have never held back on their name are self-executing programs across various blockchains that perform decentralized finance (DEFI) transactions when certain conditions are met.
Systems like A1 can make a profit
Like most programs with sufficient complexity, smart contracts have bugs and stealing funds is a reward. Last year, the cryptocurrency industry lost nearly $1.5 billion to hacking attacks, according to Web3 security platform vendor Immunefi. [PDF]. Since 2017, Crims has stole around $11.74 billion from the Defi platform.
It also appears that AI agents can make these funds even easier.
Arthur Gervas, professor of information security at UCL, and Liyi Zhou, lecturer in computer science at USYD, have developed an AI agent system that uses a variety of AI models from Openai, Google, Deepseek, and Alibaba (QWEN) to develop an AI agent system that develops robust smart contract exploits.
They describe the system in a preprint paper entitled “AI Agent Smart Contract Exploit Generation.”
Given a set of target parameters, which are blockchain, contract address, and block number, the agent selects the tool and gathers information to understand contract behavior and vulnerabilities. We then generate an exploit in the form of an organizeable robustness contract and test it against historical blockchain state.
When prompted to find a vulnerability in your code, LLMS can find bugs, but it is often invented a number of phantom flaws that have banned open source projects like Curl from submitting AI-generated vulnerability reports.
Therefore, the A1 Agent system consists of a set of tools to make its exploits more reliable. These include source coding fetchers that can resolve proxy contracts, and individual tools to initialize parameters, reading contact features, disinfecting code, testing code execution, and calculating revenue.
“A1 performs a complete exploit generation,” Zhou said. Register By email. “This is important. This is different from other LLM security tools. The output is not just a report, it's real executable code. A1 is actually closer to a human hacker.”
Tested with 36 real-world vulnerable contracts on Ethereum and Binance Smart Chain Blockchains, the A1 showed a success rate of 62.96% (17 out of 27) on the Verite benchmark.
According to the author, the A1 also discovered nine additional vulnerable contracts, five of which occurred after the training cutoff of Openai's O3-Pro, the best performance model. This is relevant as it indicates that the model does not only refurbish vulnerability information that became available during training.
“Across all 26 successful cases, A1 extracts up to US$8.59 million per case, totaling US$933 million,” the paper reports. “We analyzed iterationwise performance through 432 experiments across 6 LLMS, with average marginal profits of +9.7%, +3.7%, +5.1%, and +2.8%, with an average marginal growth rate of 2-5, and a cost per experience of $0.01-3.59.”
Researchers tested A1 on various LLMSs: O3-Pro (Openai O3-Pro, O3-Pro-2025-06-10), O3 (Openai O3, O3-2025-04-16), Gemini Pro (Google Gemini 2.5 Pro Pewiew, Gemini-2.5-Pro), Gemini Flash (Google Gemini 2.5 Preview 05-flash Preview 05-flash Preview) GEMINI-2.5-FLASH-PREVIEW-04-17), R1 (DeepSeek R1-0528), and QWEN3 MOE (QWEN3-235B-A22B).
Openai's O3-Pro and O3 had the highest success rates of 88.5% and 73.1%, respectively, given the 5-turn budget where the model interacts with itself in the agent loop. The O3 model also earned 69.2% and 65.4% of the maximum revenue from exploited contracts, while maintaining high revenue optimization.
This type of exploit can also be identified using manual code analysis along with static and dynamic fuzzing tools. However, the authors observe that manual methods have limitations due to the amount and complexity of smart contracts, the slowness and shortage of human security experts, and the high false positive rates of existing automated tools.
In theory, we could deploy A1 and earn more from exploits than operating expenses, assuming that law enforcement did not intervene.
“Systems like the A1 can make money,” Zhou explained. “I'll give you a specific example [from the paper],Figure 5 shows that,O3-PRO remains profitable, even if only one in 1000 scans leads to a real vulnerability, as long as the vulnerability has been introduced in the last 30 days. ”
Probably a central bank's digital currency feature, programmable or “dedicated” money is coming
read more
Zhou said the time window is important as it is likely that researchers have discovered older vulnerabilities and that users have applied patches.
“It's not easy to find fresh bugs like this, but it's especially possible at large scale. Once some valuable exploits are discovered, you can easily pay for the cost of running thousands of scans. We also hope that as AI models continue to improve, we will find these vulnerabilities and make the system even more effective over time.”
When asked if A1 discovered a zero-day vulnerability in the wild, Zhou replied, “There are no zero-days in this paper.”
This paper concludes by warning about a 10x asymmetry between attacker's rewards, compared to defense's rewards, if the attacker uses AI tools and the defender uses traditional tools. Essentially, the author argues that payments for bug prizes need to approach misuse value or that the cost of defensive scans should be reduced by orders of magnitude.
“To find a vulnerability, you need a scan of about $1,000 and it costs $3,000,” the paper says. “The $10,000 exploit will fund an attacker's future scans, while the defender's $10,000 bounty covers only 3.3k. This difference in magnitude in reinvestment ability leads to divergence of scan capabilities.”
The risk of incarceration can change the calculations somewhat. However, given the current regulatory environment in the US and the estimated cybercrime enforcement rate of 0.05%, that would be a small risk adjustment.
Zhou argues that the cost gap between offense and defense is a serious challenge.
“My recommendation is that project teams need to use tools like A1 to continually monitor their own protocols, rather than waiting for third parties to find the issue,” he said. “The project team and attacker utilities are TVL-wide. [Total Value Locked of the smart contract]White Hat's rewards often close at 10%. ”
“That asymmetry makes it difficult to compete without aggressive security. If you rely on a third-party team, they essentially trust that they act in good faith and stay within the 10% prize money. This is a very strange assumption from a security standpoint. Usually, we assume that all players are financially reasonable when modeling security issues.”
Researchers in the July 8 draft of their paper showed that they are planning to release A1 as open source code. But Zhou said it differently when asked about the availability of the source code.
“Given how strong the A1 is and the concerns above, we still don't know if it's the right move, so we've removed the open source (ARXIV will show tomorrow) mention,” he said. ®
