Microsoft MDASH: An AI Just Found 16 Critical Windows Bugs
TL;DR - Microsoft just unveiled MDASH (multi-model agentic scanning harness), a security AI built by their Autonomous Code Security, MORSE and Windows Attack Research & Protection teams. It orchestrates over 100 specialised agents across an ensemble of frontier and distilled models. On its way to the May 2026 Patch Tuesday, MDASH found 16 new Windows vulnerabilities including four critical remote code execution flaws. It also scored 88.45% on the public CyberGym benchmark, topping the leaderboard and beating Anthropic's Claude Mythos Preview. What you need to do: patch May 2026 Patch Tuesday urgently, rethink the speed of your vuln management cycle, and start a conversation about how AI-discovered bugs will land in your shop.
MDASH By The Numbers
| Metric | Result |
|---|---|
| New Windows vulnerabilities found before Patch Tuesday | 16 (10 kernel-mode, 6 user-mode) |
| Critical remote code execution flaws in that 16 | 4 |
| Specialised agents in the harness | Over 100 |
| Planted vulnerabilities found on StorageDrive test | 21 of 21, zero false positives |
| Historical recall on clfs.sys MSRC cases (5 years) | 96% (28 cases) |
| Historical recall on tcpip.sys MSRC cases (5 years) | 100% (7 cases) |
| CyberGym benchmark score (1,507 real-world vulns) | 88.45% (top of public leaderboard) |
| GPT-4.1 + Claude Sonnet 3.7 score same time last year | ~15% |
🎙️ This blog unpacks a conversation from Out of Band: A Microsoft Security Podcast, where Andrew O'Young (Microsoft MVP, Informotion), Anthony Porter (Canon Business Services) and I work through the May 2026 Microsoft security news. If you'd rather listen than read, the full episode is on YouTube.
I scan a lot of Microsoft Patch Tuesday write-ups. Most of them blur together. Same vendor names, same severities, same "rated important" boilerplate.
This one made me put my coffee down.
On the same day Microsoft shipped its May 2026 cumulative updates, the Microsoft Security Blog quietly published a research post titled Defense at AI speed. Buried in it: an internal AI security system called MDASH had spent the previous cycle finding 16 previously-unknown vulnerabilities in the Windows networking and authentication stack. Four of those were critical remote code execution. Every single one got patched on 12 May 2026.
In a single year, Microsoft's vulnerability-hunting AI has gone from scoring ~15% on the standard benchmark to scoring 88.45%. To put that in perspective, the same benchmark a year ago was the wall every model bounced off. Now Microsoft is leading the public leaderboard and Anthropic's Mythos Preview, the model that put the entire industry on edge last month, is in second place.
If you missed the Claude Mythos Preview story - the AI that escaped its sandbox, found thousands of zero-days, and built working exploits for under $2,000 - you should read that first. MDASH is the defensive twin of that story, and the two together tell you exactly where security is heading.
Let me walk you through what MDASH actually is, the bugs it just found, and what it changes for the rest of us who run Microsoft estates.
What MDASH Actually Is
MDASH is short for multi-model agentic scanning harness. It's a research system built by three Microsoft teams working in concert:
- Autonomous Code Security (ACS) - the primary development team
- Microsoft Offensive Research & Security Engineering (MORSE)
- Windows Attack Research and Protection (WARP)
The headline architectural choice is in the name: it is model-agnostic. MDASH doesn't bet on one large language model. Instead, it runs what Microsoft calls a configurable panel of models - a mix of state-of-the-art frontier models, distilled lighter-weight models, and a separate SOTA model used for adversarial review. Over 100 specialised agents sit on top of that panel, each tuned to a specific vulnerability class (memory corruption, type confusion, authentication bypass, parser bugs, and so on).
The agents argue with each other. One agent proposes a candidate vulnerability. Another tries to disprove it. A third tries to build a working proof of concept. A fourth checks whether the PoC would actually execute against current Windows builds. Only candidates that survive the full debate become reported findings.
Microsoft summed up the design philosophy in a single line from Taesoo Kim, Microsoft's VP of Agentic Security:
"The harness does the work, and the model is one input."
That's the part that should stick. The big news in agentic AI security isn't "we got a smarter model". It's "we built the scaffolding that makes a panel of models reliably good at a specialist job".
The 16 Windows Bugs It Found
The May 2026 cumulative updates patched 16 vulnerabilities MDASH discovered in the networking and authentication surface. Microsoft published the CVE list and the affected component breakdown:
Kernel-mode (10 vulnerabilities)
tcpip.sys- the Windows TCP/IP stackhttp.sys- the kernel-mode HTTP listenerikeext.dll- IKEv2 authentication
User-mode (6 vulnerabilities)
netlogon.dll- domain authenticationdnsapi.dll- DNS clienttelnet.exe- yes, telnet, still here, still findable
The CVE numbers (for your patch reporting): CVE-2026-33827, CVE-2026-40413, CVE-2026-40405, CVE-2026-33824, CVE-2026-40406, CVE-2026-35422, CVE-2026-32209, CVE-2026-35424, CVE-2026-35423, CVE-2026-40414, CVE-2026-40401, CVE-2026-40415, CVE-2026-33096, CVE-2026-40399, CVE-2026-41089 and CVE-2026-41096.
Four of those are critical remote code execution. If you've been treating Patch Tuesday as "I'll get to it by end of month", you've been treating it wrong for a while. This month, do not treat it that way.
The bugs are real. They were found by an AI. They were patched the same week they were discovered. The only thing standing between an attacker and your unpatched Windows servers is how fast you deploy KB5089549 / KB5087420 through your change management.
The CyberGym Result That Beat Mythos
The other half of the announcement is the benchmark.
CyberGym is a public, third-party evaluation suite with 1,507 real-world vulnerabilities drawn from production codebases. It's the closest thing the industry has to a standardised "can your AI actually find bugs that exist in the wild" test. Models score on whether they discover the vulnerability and whether they can reason about exploitability without being told what they're looking for.
MDASH scored 88.45%.
For context:
- The next entry on the public CyberGym leaderboard at publication was roughly five points behind MDASH. That entry was an Anthropic submission built on Claude Mythos Preview, the model the entire security industry was talking about a month ago.
- Last year, the best public results from a panel of GPT-4.1 + Claude Sonnet 3.7 sat around 15%.
A six-fold increase in benchmark performance in twelve months is not "the field improved a bit". That's a generational jump, in exactly the same shape we saw a year earlier with chess and Go. Once an evaluation that previously discriminated between models stops discriminating between models, you know the underlying capability has crossed a threshold.
Same time last year, an AI vulnerability scanner couldn't reliably find one bug in five. Now Microsoft's is finding nine in ten - and the bugs are real.
The Comparison That Should Worry You
Read the line about the harness one more time: "The harness does the work, and the model is one input."
Single-model security tooling - the kind most vendors are still selling - is rapidly becoming the slow lane. The systems that consistently top the leaderboards are agentic harnesses with multiple models arguing in parallel, multiple specialist agents per class of bug, and an evaluation loop that filters false positives before a human ever sees them.
That's a problem for two groups:
- Vendors who sell "AI-powered" scanners that are actually a single LLM with a vulnerability prompt. That market is about to compress hard.
- Defenders running quarterly vulnerability cycles. The attackers' equivalent of MDASH is a question of when, not if. If your Patch Tuesday SLA is 30 days because "that's what the framework says", your framework was written before AI could find a critical RCE in a kernel network stack inside a single research cycle.
It's also the part the team behind MDASH was explicit about. From the announcement: "AI vulnerability discovery has crossed from research curiosity into production-grade defense at enterprise scale, and the durable advantage lies in the agentic system around the model rather than any single model itself."
That's a polite way of saying the era of bolt-on AI security is closing.
Why This Could Be Genuinely Good For Security
Before the doom register kicks in, the optimistic case is real.
For two decades the defender side of the asymmetry has been losing. There aren't enough security researchers. There aren't enough hours in their days. Critical Windows components like tcpip.sys have decades of accumulated complexity and a single team of humans cannot meaningfully audit all of it.
MDASH found a 100% recall against five years of confirmed MSRC cases in tcpip.sys. Translated: every documented bug the human MSRC team has acknowledged in tcpip.sys over the last five years, MDASH would have found in advance, with zero false positives on the planted vulnerability test. That is a fundamentally different posture for Windows kernel security.
The same logic applies to every large vendor with a private agentic harness. Google has Big Sleep. Anthropic has Project Glasswing. Microsoft now has MDASH plus the existing Microsoft Threat Intelligence Center pipeline. Each of these teams is shipping more secure software than they were twelve months ago, and the rate of improvement is accelerating.
That doesn't fix the long tail. There are tens of thousands of small vendors and tens of millions of legacy systems with no such pipeline. But "Windows kernel" sits at the centre of half the world's compute, and Windows kernel just got measurably safer. That matters.
What This Means For You Right Now
If you run a Microsoft estate of any size:
- Patch May 2026 Patch Tuesday this week. Not this month. This week. Four critical RCE flaws in the networking stack are not theoretical.
- Audit your patching SLA against this news cycle. If your standard SLA for critical RCE is more than seven days, you are accepting a risk profile your stakeholders almost certainly haven't signed off on, given that vulnerabilities are now being discovered and weaponised inside the same research cycle.
- Read your conditional access policies. If you've delayed deploying token protection or sign-in risk policies because they're "tricky", that delay just got more expensive. Microsoft's Conditional Access policy templates are a good starting point.
- Watch the MDASH preview waitlist. It is currently in limited private preview. When it expands, it will change how you do code review on internal applications.
- Treat AI as a security force multiplier, not a threat surface. AI red team capability is here. Defenders who refuse to use it will lose to attackers who will.
Key Takeaways
- MDASH is Microsoft's new multi-model agentic security AI. Built by Autonomous Code Security, MORSE and WARP. Over 100 specialised agents on top of a configurable model panel.
- It found 16 new Windows networking and authentication vulnerabilities including four critical RCE, all patched in May 2026 Patch Tuesday (KB5089549 and KB5087420 for Windows 11).
- It beat Anthropic's Claude Mythos Preview on the public CyberGym benchmark by roughly 5 points, scoring 88.45% on 1,507 real-world vulnerabilities.
- The architecture is the news, not the score. A panel of models being orchestrated by a smart harness outperforms any single model, and the gap is widening.
- Quarterly patching is officially obsolete for critical RCE. When vulnerabilities are being discovered and patched inside a single research cycle, your 30-day SLA is the attacker's runway.
- The defender side is winning right now, but only for vendors who can afford the pipeline. That has its own structural implications for the long tail of small software vendors.
FAQ
What is Microsoft MDASH?
MDASH (multi-model agentic scanning harness) is Microsoft's new AI security system. It orchestrates over 100 specialised AI agents across an ensemble of frontier and distilled models to autonomously discover, validate and prove exploitable software vulnerabilities. It was built by Microsoft's Autonomous Code Security team in collaboration with MORSE and WARP.
What is the CyberGym benchmark?
CyberGym is a public third-party benchmark of 1,507 real-world software vulnerabilities used to evaluate how well AI systems can autonomously discover and reason about exploitable bugs. MDASH currently sits at the top of the public CyberGym leaderboard with an 88.45% score.
How does MDASH compare to Claude Mythos Preview?
Both are advanced agentic AI systems for vulnerability discovery, but they take different architectural approaches. Mythos Preview is a single specialised frontier model trained for cybersecurity. MDASH is a harness that orchestrates a configurable panel of multiple models. On the public CyberGym benchmark at announcement, MDASH scored approximately 5 points higher than Anthropic's Mythos submission. Read more about Mythos in our deep dive.
Is MDASH publicly available?
No. MDASH is in limited private preview as of May 2026. Microsoft has opened a waitlist for selected partners and customers. It will likely surface inside the Defender suite before it becomes generally available, in the same pattern as Security Copilot.
Which Windows vulnerabilities did MDASH find?
16 new vulnerabilities across tcpip.sys, http.sys, ikeext.dll, netlogon.dll, dnsapi.dll and telnet.exe. Four are rated critical remote code execution. All were patched in the May 2026 Patch Tuesday release on 12 May 2026.
What should I do about this as an IT pro?
Patch your Microsoft estate against May 2026 Patch Tuesday immediately. Review your patching SLA - if it's longer than seven days for critical RCE, it is now unfit for purpose. Subscribe to the MDASH preview waitlist if you maintain internal applications. And read the Microsoft Security Blog announcement in full.
My Take
I have spent fifteen years watching the gap between "research demo" and "operational reality" be measured in years, not weeks. MDASH closed that gap in a single Patch Tuesday cycle. The research blog and the cumulative update went out on the same day. The bugs it found were in the patch. That is a different operating tempo than the one most enterprise security teams are scoped for.
The other thing that keeps me up: 88.45% is the public number. Whatever Microsoft, Google and Anthropic are running internally is not on the public leaderboard. The internal numbers are higher. We just don't see them.
For defenders, this is the best year in a long time to be running Microsoft infrastructure. The vendor is shipping more secure code than they have at any point this century, and the rate of improvement is going one direction. For everyone else - all the SaaS vendors with no internal MDASH equivalent, all the consumer hardware running open-source firmware nobody audits - the pressure is going to get worse before it gets better. AI red team capability scales for attackers in the same way it scales for defenders. The defenders with the pipelines win. The defenders without them get found.
The single most useful thing you can do this week is patch. The single most useful thing you can do this quarter is shrink your patching SLA. The single most useful thing you can do this year is start the budget conversation about how your team uses agentic AI for code review. None of this is exciting. But it works.
Mathew Clark Founder, SecureInSeconds Currently: deciding whether the most polite thing to call my patching SLA is "aspirational".
Further Reading
- Defense at AI speed: Microsoft's new multi-model agentic security system tops leading industry benchmark - the primary Microsoft Security Blog announcement
- Defense at AI speed: Microsoft's new multi-model agentic security system finds 16 new vulnerabilities - companion post with the CVE list and Windows component breakdown
- Microsoft on pace to break annual vulnerability record as AI-driven patch wave takes hold - The Record's analysis of the 2026 vulnerability cadence
- Microsoft's multi-agent AI system tops Anthropic's Mythos on cybersecurity benchmark - GeekWire's coverage of the leaderboard result
- Claude Mythos Preview: The AI That Escaped Its Sandbox and Emailed a Stranger - companion SecureInSeconds analysis of the previous month's AI security shock
- Your Copilot Rollout Is A Security Disaster - the operational sister piece on agentic AI inside the Microsoft estate
- Out of Band: A Microsoft Security Podcast - May 2026 Episode - the full conversation between Andrew O'Young, Anthony Porter and me where this MDASH discussion happened




