All posts
Supply Chain Compromise

LMDeploy SSRF: The Inference NHI Was the Real Target

A vision-language image loader in LMDeploy became an SSRF primitive, exposing GPU node IAM credentials 12 hours after CVE-2026-33626 disclosure

Securityv0 Intelligence Team OWASP: ASI06 sv0 finding: nhi_compromise
lmdeploy nhi ssrf vlm iam asi06

The Incident

On 2026-04-21, GitHub published advisory GHSA-6w67-hwm5-92mq — later assigned CVE-2026-33626 (CVSS 7.5) — a Server-Side Request Forgery vulnerability in LMDeploy, InternLM and Shanghai AI Lab’s LLM and vision-language model inference toolkit. The flaw lives in load_image() in lmdeploy/vl/utils.py: the function fetches any URL handed to it as image input, with no validation of private, loopback, or link-local ranges. Versions prior to 0.12.3 are affected; 0.12.3 patches the issue.

Within 12 hours and 31 minutes of advisory publication, Sysdig’s Threat Research Team observed active exploitation against their honeypot fleet. The attacker used the image loader as a generic HTTP-fetch primitive and port-scanned the internal network behind the model server, hitting the AWS Instance Metadata Service at 169.254.169.254, Redis on 6379, MySQL on 3306, a secondary HTTP administrative interface on 8080, and an out-of-band DNS exfiltration endpoint. The practical impact is not a missing image — it is that whatever cloud credentials the host node carried are now reachable from an unauthenticated inference API.

MITRE ATT&CK coverage: T1190 (Exploit Public-Facing Application), T1552.005 (Unsecured Credentials: Cloud Instance Metadata API), T1567 (Exfiltration Over Web Service).

The Authority Path That Failed

The identity carrying execution authority is the LMDeploy serving process on a GPU node. The scope it holds is whatever cloud credentials the node was configured with — the AWS instance profile, any SSM or Secrets Manager permissions, S3 model-bucket access, cross-service roles, and so on. The scope it should exercise is narrow: pull model artifacts from a few known buckets, write logs, serve inference requests. The scope it actually exercised on 2026-04-21 was “fetch an arbitrary URL supplied by an unauthenticated API caller” — because load_image() treats the image URL field as a generic HTTP fetch, including targets like http://169.254.169.254/latest/meta-data/iam/security-credentials/.

The trust anchor that failed first is the absence of input validation on a field the VLM surface treats as user data. There is no private-IP allowlist, no link-local block, no IMDSv2-only requirement at the node level, and no instance-profile scoping that limits what a successful IMDS fetch can return. Each gap was auditable before the CVE landed: every VLM inference endpoint that accepts user-supplied URLs is a non-human identity with HTTP-egress authority the operator likely did not realize it held.

SecurityV0 Perspective

This fits nhi_compromise / ASI06. The SSRF is the vector, but the prize is the instance-profile credential — a non-human identity that the inference workload held, whose blast radius (S3, ECR, cross-account assume-role, production secrets) exceeded anything the load_image() function functionally needed. SecurityV0 treats the inference process as an NHI in its own right, with a justifiable authority baseline; any gap between that baseline and the IAM role actually attached to the GPU node surfaces as a finding before an attacker finds it first.

The evidence pack would enumerate, per VLM serving endpoint in the environment: which lmdeploy (or equivalent) processes accept user-controlled URL parameters, the IAM instance profile bound to each GPU node, IMDSv2 enforcement state, egress network policy toward 169.254.169.254 and other internal ranges, and the last-rotation timestamp plus scope of any cloud tokens reachable from the inference process. Before exfiltration, the pack answers: which inference NHIs hold credentials they never needed, and which of those NHIs expose an HTTP-fetch surface to unauthenticated callers? After the fact, it answers the forensic question: which VLM processes issued IMDS calls during the exposure window, which IAM role creds were minted, and where did those creds reach next?

What To Do

  • Upgrade LMDeploy to 0.12.3 everywhere VLMs are served. Enumerate every host running lmdeploy, resolve the installed version, and patch to 0.12.3 or later. The fix is a version pin against CVE-2026-33626; a blanket “update Python” is not sufficient. Confirm against the GitHub release, not the last pip freeze.
  • Enforce IMDSv2-only on every GPU node running an inference workload. Set HttpTokens=required and HttpPutResponseHopLimit=1 on every EC2 instance metadata configuration serving VLMs. IMDSv2’s session-token handshake turns a blind SSRF into an exchange the attacker cannot complete; IMDSv1 hands credentials back on a single GET. Check coverage with aws ec2 describe-instances filtering on MetadataOptions.HttpTokens.
  • Scope the inference NHI to its actual job. The GPU node’s instance profile should allow only the S3 prefixes, ECR repos, and Secrets Manager paths the inference workload reads at startup. Drop wildcards, drop cross-account assume-role, drop write permissions to any bucket the process does not publish to. The test is: if this credential leaks, does the attacker get just model weights, or your production plane?
  • Block egress from model-serving pods to 169.254.169.254 and RFC1918 ranges unless explicitly required. Kubernetes NetworkPolicy or VPC security groups scoped to the inference namespace make SSRF-to-IMDS a closed path rather than an open one. Pair with a default-deny egress policy; explicit allowlists for S3, ECR, and the model registry are typically sufficient for an inference node’s real traffic pattern.
  • Alert on IMDS calls originating from an inference process. EDR and cloud-audit logs can flag 169.254.169.254 fetches whose parent process is lmdeploy, vllm, text-generation-inference, or any similar serving runtime — that signal should be zero in a healthy environment. Feed those events to the same queue that handles SSRF signatures, not to a generic cloud-anomaly bin.

Sources