Prompt Injection Just Got Classified as Malware
Field Guide
Prompt Injection Just Got Classified as Malware
Researchers want prompt injection reclassified as malware. A $40K bounty from UK AISI, OpenAI, and Anthropic is testing why.
Key takeaway
Researchers are formalizing 'promptware'—a new classification that treats prompt injection attacks as actual malware, with kill chains that mirror traditional multi-stage exploits
Key takeaway
A $40K bounty running Feb 25 - Mar 11, 2026 (UK AISI, OpenAI, Anthropic, Amazon, Meta, Google DeepMind) proves the threat is real enough to fund research against actual frontier models
Key takeaway
Success rates exceed 85% against state-of-the-art defenses, and OWASP lists prompt injection as the #1 risk for LLM applications—but the industry still treats it like a chatbot insult problem
An LLM is an execution environment. Prompt injection is malware. Researchers are now making that case formally.
That’s the argument researchers from Tel Aviv University, Ben-Gurion University, and others are making with increasing certainty. They’ve published a kill chain framework showing how prompt injections work exactly like traditional multi-stage attacks: initial access, privilege escalation, reconnaissance, persistence, command and control, lateral movement, actions on objective.
They call it “promptware.”
Answer-First Summary
Researchers propose reclassifying prompt injection as malware because LLMs operate as execution environments for adversarial instructions, similar to operating systems running malicious code. A $40K bounty from major AI labs is validating these concerns. Attack success rates exceed 85% against current defenses, yet most organizations still treat prompt injection as a peripheral security concern rather than a critical runtime threat.
What changes when we call it malware?
The word matters more than it should. “Prompt injection” sounds like a user input validation problem. Like SQL injection. Like XSS. Fix your filters, sanitize your inputs, move on.
But LLMs don’t work like databases. They don’t parse statements. They execute instructions written in natural language. The distinction isn’t semantic—it changes what we defend against.
When you send a prompt to an LLM, that model will execute whatever instructions it understands. That’s not a bug. That’s the core feature. The model’s job is to process language and follow directions. So the defense can’t be “block bad inputs.” The defense has to be “ensure only authorized inputs can reach the model” and “limit what the model is allowed to do once reached.”
That’s a runtime security problem. That’s what malware mitigation looks like.
Why researchers are drawing this parallel
In January 2026, researchers published “The Promptware Kill Chain: How Prompt Injections Gradually Evolved Into a Multistep Malware Delivery Mechanism” on arXiv. They analyzed 36 major studies and real-world incidents. At least 21 documented attacks traversed four or more stages of their kill chain framework.
The stages look familiar if you’ve ever audited actual breach reports:
- Initial Access: Prompt injection through user input, documents, or web content
- Privilege Escalation: Jailbreaking or manipulating system prompts
- Reconnaissance: Extracting system information or probing available tools
- Persistence: Poisoning memory, vector databases, or retrieval systems
- Command and Control: Establishing communication channels to the attacker
- Lateral Movement: Jumping to other systems, users, or services
- Actions on Objective: Data exfiltration, code execution, financial fraud
The researchers found real examples of all seven. Not theoretical. Documented. It’s not “a chatbot said something we didn’t want.” It’s “an attacker executed arbitrary code on a victim’s system through a compromised LLM application.”
The evidence is in the funding
The UK Artificial Intelligence Safety Institute, OpenAI, Anthropic, Amazon, Meta, and Google DeepMind are currently running a $40K bounty competition. It runs through March 11, 2026. The focus: indirect prompt injection attacks against frontier models in real agentic environments (tool use, browser automation, coding agents).
You don’t fund $40K in research for something you think is a minor issue. You fund it when you’re worried. And you fund it through an independent safety institute when you want credible, third-party validation.
The prize structure reveals what they’re actually testing. Top performers get $2K. But there’s also $500 per model awarded to anyone who can successfully break into them—up to the first 500 breaks on any given model. That’s not a “find the worst attack” contest. That’s a “how many different ways can you exploit this” contest.
What the 85% statistic actually means
When we say attack success rates exceed 85% against state-of-the-art defenses, we’re not talking about simple attacks. We’re talking about adaptive attacks where the attacker observes the defense and modifies the approach. This is what researchers call “the attacker moves second” principle.
The defender publishes a rule. No slurs allowed. No instructions to ignore system prompts. The attacker sees the rule and writes a prompt that works around it. By the time the defense is public, it’s already obsolete.
And it’s not just one defense failing. Anthropic, OpenAI, Google, and Meta have each published defenses. None of them work consistently across all attack vectors. The frontier models themselves remain vulnerable after applying their own best mitigations.
OWASP lists prompt injection as the #1 risk in their Top 10 for Large Language Model Applications. It’s been #1 since the list was created. That means security teams have known about this for months. And yet most organizations with LLM applications in production don’t treat it like a runtime threat. They treat it like they’re waiting for the technology to mature before worrying.
The evidence suggests the time to worry is now.
The human parallel: systems executing untrusted instructions
Here’s the uncomfortable truth. You wouldn’t deploy a server that executes shell commands provided by untrusted users. You wouldn’t give an application sudo access to someone else’s filesystem without authentication and auditability. You wouldn’t trust a system that runs arbitrary code from the internet.
But we’re doing all three with LLMs. We’re treating “the model will process whatever instructions it receives” as a feature instead of a threat boundary.
The reason is philosophical. We think of LLMs as helpers. As tools that are supposed to be helpful, harmless, and honest. So when an LLM gets “jailbroken” into acting harmfully, we blame the jailbreak. We blame the user. We blame the attacker.
We don’t blame the fundamental design choice that the system can be jailbroken at all.
Malware doesn’t work that way. A malicious binary on your computer is malicious because it executes unauthorized instructions with the permissions granted to the process. The defenses are boundaries: memory isolation, privilege levels, runtime permissions, code signing. The defenses don’t say “be a good binary.” They say “you cannot do this thing, regardless of what instruction you receive.”
That’s the defense model we need for LLMs. Not prompt filtering. Not jailbreak detection. Actual execution boundaries.
What to do this week
If you’re shipping LLM applications:
Immediate: Map where untrusted input can reach your models. Documents. Web content. User uploads. External APIs. Any of these can carry prompt injections.
This week: Test with adversarial prompts. Not the obvious ones. Indirect injections. Context-aware attacks. Use the OWASP LLM Prompt Injection Prevention Cheat Sheet as a starting framework, but don’t assume it’s sufficient.
Before production: Implement permission boundaries around what your LLM application is actually allowed to do. If it only needs to summarize documents, it shouldn’t have access to delete files or send emails. If it’s a customer support agent, it shouldn’t be able to access other customers’ data.
The defense isn’t “block bad inputs.” It’s “assume any input might be adversarial and limit what the system can do if compromised.”
Next in the series: One Firebase Misconfig Leaked 300M Chat Messages — How a single configuration error exposed the exact threat we’re trying to defend against.