Security

Prompt Injection Just Got Classified as Malware

If you are working on agent security and ai security, this is for you.

Take Interest Inc.March 2, 20265 min readLast reviewed 2026-03-02

ai-securityruntime-protectiongovernance

Table of contents

Key takeaway

Researchers are formalizing 'promptware': a new classification that treats prompt injection attacks as actual malware, with kill chains that mirror traditional multi-stage exploits

Key takeaway

A $40K bounty running Feb 25 - Mar 11, 2026 (UK AISI, OpenAI, Anthropic, Amazon, Meta, Google DeepMind) proves the threat is real enough to fund research against actual frontier models

Key takeaway

Success rates exceed 85% against current defenses, and OWASP lists prompt injection as the #1 risk for LLM applications, but the industry still treats it like a chatbot insult problem

An LLM is an execution environment. Prompt injection is malware. Researchers are now making that case formally.

That’s the argument researchers from Tel Aviv University, Ben-Gurion University, and others are making with increasing certainty. They’ve published a kill chain framework showing how prompt injections work exactly like traditional multi-stage attacks: initial access, privilege escalation, reconnaissance, persistence, command and control, lateral movement, actions on objective.

They call it “promptware.”

Answer-First Summary

Researchers propose reclassifying prompt injection as malware because LLMs operate as execution environments for adversarial instructions, similar to operating systems running malicious code. A $40K bounty from major AI labs is validating these concerns. Attack success rates exceed 85% against current defenses, yet most organizations still treat prompt injection as a peripheral security concern rather than a critical runtime threat.

What changes when we call it malware?

The word matters more than it should. “Prompt injection” sounds like a user input validation problem. Like SQL injection. Like XSS. Fix your filters, sanitize your inputs, move on.

But LLMs don’t work like databases. They don’t parse statements. They execute instructions written in natural language. The distinction isn’t semantic. It changes what we defend against.

When you send a prompt to an LLM, that model will execute whatever instructions it understands. That’s not a bug. That’s the core feature. The model’s job is to process language and follow directions. So the defense can’t be “block bad inputs.” The defense has to be “ensure only authorized inputs can reach the model” and “limit what the model is allowed to do once reached.”

That’s a runtime security problem. That’s what malware mitigation looks like.

Why researchers are drawing this parallel

In January 2026, researchers published “The Promptware Kill Chain: How Prompt Injections Gradually Evolved Into a Multistep Malware Delivery Mechanism” on arXiv. They analyzed 36 major studies and real-world incidents. At least 21 documented attacks traversed four or more stages of their kill chain framework.

The stages look familiar if you’ve ever audited actual breach reports:

Initial Access: Prompt injection through user input, documents, or web content
Privilege Escalation: Jailbreaking or manipulating system prompts
Reconnaissance: Extracting system information or probing available tools
Persistence: Poisoning memory, vector databases, or retrieval systems
Command and Control: Establishing communication channels to the attacker
Lateral Movement: Jumping to other systems, users, or services
Actions on Objective: Data exfiltration, code execution, financial fraud

The researchers found real examples of all seven. Not theoretical. Documented. The pattern is “an attacker executed arbitrary code on a victim’s system through a compromised LLM application,” not “a chatbot said something we didn’t want.”

The evidence is in the funding

The UK Artificial Intelligence Safety Institute, OpenAI, Anthropic, Amazon, Meta, and Google DeepMind are currently running a $40K bounty competition. It runs through March 11, 2026. The focus: indirect prompt injection attacks against frontier models in real agentic environments (tool use, browser automation, coding agents).

You don’t fund $40K in research for something you think is a minor issue. You fund it when you’re worried. And you fund it through an independent safety institute when you want credible, third-party validation.

The prize structure reveals what they’re actually testing. Top performers get $2K. But there’s also $500 per model awarded to anyone who can successfully break into them—up to the first 500 breaks on any given model. That’s not a “find the worst attack” contest. That’s a “how many different ways can you exploit this” contest.

What the 85% statistic actually means

When we say attack success rates exceed 85% against current defenses, we’re not talking about simple attacks. We’re talking about adaptive attacks where the attacker observes the defense and modifies the approach. This is what researchers call “the attacker moves second” principle.

The defender publishes a rule. No slurs allowed. No instructions to ignore system prompts. The attacker sees the rule and writes a prompt that works around it. By the time the defense is public, it’s already obsolete.

And it’s not just one defense failing. Anthropic, OpenAI, Google, and Meta have each published defenses. None of them work consistently across all attack vectors. The frontier models themselves remain vulnerable after applying their own best mitigations.

OWASP lists prompt injection as the #1 risk in their Top 10 for Large Language Model Applications. It’s been #1 since the list was created. That means security teams have known about this for months. And yet most organizations with LLM applications in production don’t treat it like a runtime threat. They treat it like they’re waiting for the technology to mature before worrying.

The evidence suggests the time to worry is now.

The human parallel: systems executing untrusted instructions

Here’s the uncomfortable truth. You wouldn’t deploy a server that executes shell commands provided by untrusted users. You wouldn’t give an application sudo access to someone else’s filesystem without authentication and auditability. You wouldn’t trust a system that runs arbitrary code from the internet.

But we’re doing all three with LLMs. We’re treating “the model will process whatever instructions it receives” as a feature instead of a threat boundary.

The reason is philosophical. We think of LLMs as helpers. As tools that are supposed to be helpful, harmless, and honest. So when an LLM gets “jailbroken” into acting harmfully, we blame the jailbreak. We blame the user. We blame the attacker.

We don’t blame the fundamental design choice that the system can be jailbroken at all.

Malware doesn’t work that way. A malicious binary on your computer is malicious because it executes unauthorized instructions with the permissions granted to the process. The defenses are boundaries: memory isolation, privilege levels, runtime permissions, code signing. The defenses don’t say “be a good binary.” They say “you cannot do this thing, regardless of what instruction you receive.”

That’s the defense model we need for LLMs. Not prompt filtering. Not jailbreak detection. Actual execution boundaries.

What to do this week

If you’re shipping LLM applications:

Immediate: Map where untrusted input can reach your models. Documents. Web content. User uploads. External APIs. Any of these can carry prompt injections.

This week: Test with adversarial prompts. Not the obvious ones. Indirect injections. Context-aware attacks. Use the OWASP LLM Prompt Injection Prevention Cheat Sheet as a starting framework, but don’t assume it’s sufficient.

Before production: Implement permission boundaries around what your LLM application is actually allowed to do. If it only needs to summarize documents, it shouldn’t have access to delete files or send emails. If it’s a customer support agent, it shouldn’t be able to access other customers’ data.

The defense is “assume any input might be adversarial and limit what the system can do if compromised,” rather than “block bad inputs.”

Next in the series: One Firebase Misconfig Leaked 300M Chat Messages — How a single configuration error exposed the exact threat we’re trying to defend against.

Cite this post

Take Interest Inc. (2026). Prompt Injection Just Got Classified as Malware. TAKE INTEREST. https://takeinterest.ai/blog/prompt-injection-classified-as-malware

Take it with you

Save the link to come back to it, or pass it along.

88% of Agents Shipped Without Security Review

Gravitee's 2026 data: only 14% of orgs got full security approval before deploying agents. Here's what the other 88% have in common.

70% of Enterprises Can't See Their Own Agents

Nearly 70% of enterprises run AI agents in production. Most can't tell you how many they have, what they access, or who owns them. That's identity dark matter.

Your AI Agent Has No Seatbelt

AI agents are moving into production faster than safety standards. Runtime security controls need to arrive before the first serious incident.

Back to blog

Answer-First Summary

What changes when we call it malware?

Why researchers are drawing this parallel

The evidence is in the funding

What the 85% statistic actually means

The human parallel: systems executing untrusted instructions

What to do this week

Prompt Injection Just Got Classified as Malware

Related interests