Ransomware Groups Used AI to Build Better Attack Tools
Ransomware groups are now using commercial AI tools to systematically engineer malware that defeats specific endpoint security products. Sophos published research on June 2 documenting a case where attackers used Claude Opus 4.5 and Cursor, an AI-powered coding IDE, to build and test roughly 80 evasion modules against three of the most widely deployed endpoint detection products in business environments. For any growing business treating a single endpoint tool as the core of its security posture, this is a useful moment to reassess that assumption.
What Sophos Actually Found
The Sophos Counter Threat Unit published its findings from an investigation triggered by alerts on a customer's compromised host. Scripts were found on the machine, written in Russian, pointing to an organized attack infrastructure that included dedicated virtual machines, a command-and-control server running Sliver (a post-exploitation framework), and Cobalt Strike with custom profiles built to mimic legitimate network traffic.
The AI-assisted component was more methodical than most coverage suggests. The group used Claude Opus 4.5 as an orchestrating agent to manage sub-agents handling different tasks: EDR evasion testing, OPSEC hardening, documentation, and lab provisioning. Cursor served as the development environment where the malware code was actually written. The workflow looked like a structured R&D lab, not a one-off exploit.
The three endpoint security products targeted for specific evasion testing were Sophos, CrowdStrike, and Microsoft Defender. Each ran on a dedicated Windows Server 2022 virtual machine. The team built nearly 80 modules, sourced evasion techniques from published security research by Kaspersky, Palo Alto Networks, Bishop Fox, and SpecterOps, and mapped each technique to MITRE ATT&CK before testing.
The goal was a repeatable development process for defeating endpoint products at scale, not a single novel exploit.
Why Breakout Times Change the Math
This matters more when you account for how fast attackers move once they have a foothold.
CrowdStrike's 2026 Global Threat Report found the average criminal threat actor breakout time, measured from initial access to lateral movement across the network, is now 29 minutes. The fastest recorded instance in their dataset was 27 seconds.
Mandiant's M-Trends 2026 report found that once attackers gain initial access, they reach Active Directory in under 3.5 hours on average. IBM X-Force 2026 tracked a 49 percent year-over-year increase in active ransomware groups.
An attacker who has specifically engineered tools to evade your endpoint product does not need to fight through layers of alerts. They move fast through an environment where the primary detection tool cannot see them clearly. The window for human intervention narrows considerably.
The Problem with "We Have EDR"
Endpoint detection and response software is a real layer of defense. The issue is the gap between owning a product and running an effective security program.
The Sophos team noted that the agent-generated reports from the attacker's framework claimed success in defeating "almost all EDR solutions." They also flagged that the documented test outputs did not fully support those claims. Sophos noted this gap may reflect LLM overconfidence in the agents' own reporting. But three specific products were targeted, tested repeatedly, and mapped to documented evasion techniques.
This is not a story about a product being defective. Every major endpoint product has sophisticated detection capabilities. The story is about what happens when threat actors run organized development programs specifically against those products. A configuration that was tuned 18 months ago may not reflect what is being tested against it today.
Growing businesses in the 25-to-150-person range commonly deploy a single endpoint product, configure it once, and treat the box as checked. That worked better when attackers were improvising from generic toolkits. It works less well when they are running structured research programs against specific products they expect to encounter.
What Managed Monitoring Catches That Products Miss
Automated detection tools catch what they are built to detect. People catch anomalies.
Active monitoring surfaces behavioral signals that product signatures miss. A network traffic spike at 3am. A domain controller account suddenly accessing file shares it has never touched. A Windows process writing to registry locations outside its normal pattern. These are not signature failures. They are environmental signals that require context to evaluate. That context does not come pre-configured.
The Sophos research specifically highlights Active Directory discovery and enumeration as a pre-attack indicator. The attacker's framework used an agent-driven loop to automate AD discovery: collect results, choose next action, dispatch to remote agents, re-evaluate. AD enumeration before ransomware deployment is a known and documented precursor pattern. Catching it requires monitoring AD event logs with enough environmental context to recognize what normal activity looks like.
That is not a product capability. That is an operational one. It requires someone watching, understanding the environment, and acting when something looks wrong before a ransom note appears.
For more background on how attackers exploit gaps in detection coverage, the Verizon 2026 DBIR analysis covered the most common breach patterns in detail. The short version: phishing, unpatched systems, and credential theft remain the top entry points. AI-assisted development affects how fast attackers move after entry, not how they typically get in.
Questions Worth Asking About Your Current Setup
If your security setup relies primarily on a single endpoint product, a few specific questions are worth putting to your IT provider or internal team.
Who reads the alerts, and how fast? Most EDR products generate alerts. The question is whether someone is reviewing them with enough context to distinguish a real incident from a false positive, and whether the response process is faster than a 29-minute attacker breakout window.
Is Active Directory event logging in scope? Products built for endpoint protection often have limited visibility into AD-level activity. If AD enumeration is a pre-attack indicator, it needs to be on a monitored dashboard.
When was the configuration last reviewed? Endpoint security deployed and left on default settings may not reflect current evasion techniques. The Sophos research shows attackers are actively mapping their tools to the specific detection logic of major products. Configuration hygiene matters.
What happens in the first hour of an incident? Speed matters. A 29-minute average breakout time means an organization without a practiced incident response process is likely to be behind before the alert even fires.
The Windows Secure Boot certificate deadline coverage from June 1 is a related example of where timing matters in security maintenance. Managed IT handles these deadlines proactively. AI-built attack tooling is another case where the same principle applies at the monitoring layer.
The Bigger Pattern
This research is one documented case. But it reflects something broader: the cost of building sophisticated attack tools is falling. Commercial AI platforms compress what used to require a team of skilled developers into a workflow that a smaller group can run with a structured process.
AI did not invent ransomware. It lowered the development cost and shortened the iteration cycle for evasion tooling. That change benefits attackers more than defenders, at least in the short term. Defenders still need to catch everything. Attackers only need to find what gets through.
The response to that asymmetry is not a better product. It is layered defense with active human monitoring. Products handle the volume. People handle the judgment calls. That combination is what actually holds when an organized group is running a development program against your tools.
If you want to understand how your current security stack holds up against evolving threats, reach out through our managed IT services page for a conversation before an incident becomes the test.
Frequently Asked Questions
What is EDR and why does it matter for my business? Endpoint detection and response (EDR) software monitors computers and servers for suspicious behavior, alerting security teams to potential threats. It is a significant improvement over traditional antivirus, but it relies on behavioral signatures and configured rules that sophisticated attackers can test and engineer around given enough time and tools.
Can ransomware groups really use AI tools like Claude to build malware? Yes. Sophos CTU published research on June 2, 2026 confirming that an unnamed ransomware group used Claude Opus 4.5 as an orchestrating agent and Cursor as their development environment to systematically build malware evasion tools. The group framed the work as "red teaming" to avoid triggering AI safety guardrails.
What should my business do if we only have one endpoint security product? Talk to your IT provider about layered defense: EDR plus network monitoring, Active Directory event logging, and a documented incident response process. Single-product coverage leaves detection gaps that organized threat actors specifically test and exploit.
How fast do ransomware attacks move once attackers are inside a network? CrowdStrike's 2026 Global Threat Report found the average attacker breakout time is 29 minutes from initial access to lateral movement. The fastest recorded instance in their dataset was 27 seconds. Detection and response needs to operate faster than that window.
Does this mean endpoint security products are useless? No. EDR tools are a core layer of a sound defense. The problem is treating any single tool as a complete solution. The Sophos research shows organized groups are running development programs specifically against named products. Defense-in-depth with active monitoring is what holds when that happens.
Want to know how your current security stack holds up? Get in touch for a conversation before an incident becomes the test.