All posts
Unproven Execution

MCP STDIO Defaults: Unproven Execution by Design

A systemic design flaw in Anthropic's MCP SDKs lets STDIO-spawned servers execute arbitrary code in the host process the operator never authorized

Securityv0 Intelligence Team OWASP: ASI05 sv0 finding: unproven_execution
mcp anthropic unproven-execution supply-chain llm-agent asi05

The Incident

On 2026-04-16, researchers disclosed a systemic design flaw in Anthropic’s official Model Context Protocol (MCP) software development kits — across Python, TypeScript, Java, and Rust — that turns STDIO-based server configuration into an arbitrary-command-execution surface inside any host process that embeds the SDK. Independent reporting pegs the exposure at roughly 7,000 publicly reachable MCP servers and more than 150 million cumulative downloads of affected packages. At least ten CVEs were assigned across downstream consumers, including litellm, langchain, langflow, flowise, letta, and langbot.

The disclosure bundle includes CVE-2026-33032 (“MCPwn”) — a CVSS 9.8 missing-authentication flaw in the nginx-ui MCP server that is reported actively exploited; CVE-2026-23744 — a CVSS 9.8 RCE in MCPJam Inspector; CVE-2026-32211 — a CVSS 9.1 information-disclosure flaw in Azure MCP Server caused by missing authentication; CVE-2026-30615 — a Windsurf prompt-injection-to-RCE chain (CVSS 8.0); and CVE-2026-20205 — a Splunk MCP Server information-disclosure issue. Anthropic has declined to change the SDK or protocol behavior, characterizing it as “expected.”

MITRE ATT&CK coverage: T1059 (Command and Scripting Interpreter), T1195.002 (Compromise Software Supply Chain), T1569 (System Services).

The Authority Path That Failed

The identity carrying execution authority at the moment of failure is the MCP host process — the AI runtime, agent, IDE plugin, or service that instantiates an MCP client and spawns an MCP server subprocess. The scope that identity held is the full capability set of the operating-system user running the host: filesystem, network sockets, environment-variable credentials, and any cloud tokens the agent was configured with. The scope it exercised at failure is arbitrary binary invocation driven by the command and args fields in an MCP server configuration — fields whose source is often a project file, a dynamically fetched server registry, or a user-supplied plugin manifest rather than a signed operator decision.

The trust anchor that failed first is the SDK default. MCP’s STDIO transport bootstrap treats configured command/args as trusted operator intent and feeds them to execve-equivalent calls with minimal or absent quoting and no allowlist. Downstream projects — LiteLLM, LangChain, LangFlow, Flowise, Letta, LangBot — inherited this assumption and compounded it by accepting MCP server definitions from lower-trust channels. The gap between held scope (full local identity) and exercised scope (attacker-chosen binary) was auditable before the CVEs landed: every MCP spawn point whose command source is not a signed, operator-pinned manifest is an unproven-execution risk that the SDK will not catch on the operator’s behalf.

SecurityV0 Perspective

This fits unproven_execution / ASI05. The defect is not that a specific MCP server is buggy; it is that the deploying operator never explicitly authorized the binaries the SDK will spawn on their behalf. Anthropic’s “expected behavior” response makes this a durable finding rather than a patchable one: until an operator gates MCP server spawn themselves, every new release inherits the same surface.

The evidence pack for this finding would enumerate, per AI runtime in the environment (Claude Code, Cursor, Windsurf, every LLM gateway, every agent framework that loads MCP): the full list of MCP server spawn points, the source of each command/args pair (operator-signed manifest, project file, registry fetch, dynamic plugin), the file-path and package hash of the binary that would be invoked, and the host-process identity under which it runs. Before exfiltration, that pack answers a specific question: which MCP spawn points would execute code the operator never explicitly approved? After the fact, it answers the forensic question investigators are asking now: which host processes spawned which MCP binaries, sourced from which configuration channel, during the exposure window?

What To Do

  • Inventory every MCP spawn point before you patch. Enumerate each running MCP client across developer workstations, CI runners, and production agent services; for each, capture the resolved command, args, working directory, and package hash. A CVE fix to one downstream is local; the SDK-default behavior survives across versions.
  • Pin MCP server commands to signed, operator-approved manifests. Treat the command/args pair as a supply-chain artifact. Reject configurations whose command source is an unsigned project file, a plugin marketplace, or a dynamically fetched registry. Require explicit approval for every binary the SDK will spawn.
  • Patch and remove the high-CVSS bundle now. Prioritize CVE-2026-33032 (nginx-ui, CVSS 9.8, actively exploited), CVE-2026-23744 (MCPJam Inspector, CVSS 9.8), CVE-2026-32211 (Azure MCP Server, CVSS 9.1), CVE-2026-30615 (Windsurf, CVSS 8.0), and CVE-2026-20205 (Splunk MCP Server). Confirm patch status against each vendor advisory, not against a blanket “MCP update.”
  • Partition host-process identity from agent identity. An MCP host that spawns third-party servers should not run with cloud credentials, production tokens, or filesystem access beyond what the agent’s stated function requires. Move long-lived credentials out of environment variables and into per-call, scope-limited issuers; run MCP hosts under a dedicated, low-privilege OS user.
  • Alert on MCP spawn diffs, not just on known-bad binaries. Per-client-ID baselines of expected MCP spawn behavior — which binaries, which args, which parent process — turn novel command/args pairs into actionable signal. Feed spawn events into EDR with the MCP server identity attached; an unexpected binary path under a Claude Code or Cursor parent PID is higher fidelity than a generic process-tree anomaly.

Sources