TechnicalPart 7 of The Builder's Guide to Agent Security

Seven Layers of Defense for AI Agents

If you are working on agent infrastructure and defense in depth, this is for you.

Take Interest Inc.February 22, 20266 min readLast reviewed 2026-02-22

defense-in-deptharchitectureguardclaw

Table of contents

Defense Depth Flow

A secure request path should pass each stage in order with independent failure handling.

Key takeaway

A Boeing 737 MAX crashed because a single sensor failed and the software trusted it without verification. One layer. One failure.

Key takeaway

Seven layers work together so no single failure cascades into a breach. Each catches what the previous one missed.

Key takeaway

GuardClaw implements all seven layers with over a thousand detection patterns. No single layer is enough.

The Boeing 737 MAX crashed twice. The MCAS system relied on a single angle-of-attack sensor. One sensor failed. The software trusted it without verification. The aircraft did what the faulty sensor told it to do. One layer. One failure. Everyone died.

That was a system with exactly one layer of defense.

Summary: Defense in depth means each layer catches what the others missed. A Boeing learned this at catastrophic cost. Agent security should learn it without the crashes. GuardClaw uses seven layers because one is not enough.

Most agent systems have bouncer-level security. Someone checks the request at the door. They look at the obvious stuff. Is this request formatted right. Does it have an API key. Then they let it in. Nobody’s watching what the agent actually does. Nobody’s checking if the request tried to manipulate the agent into breaking its rules. Nobody’s looking at the output to see if something got exfiltrated.

That’s one layer.

A vault is laser grids and pressure plates and time locks. Multiple systems, so if one fails, the others catch it. Ocean’s Eleven works because the heist team has to plan for every defense independently. If they only account for the vault door, they miss everything else. A real vault is the opposite. Every layer assumes some previous layer failed.

Here are the seven layers that matter:

Layer One: Threat Intelligence. Before a request gets anywhere, it gets checked against known threat patterns. Is this IP address flagged. Is this request signature matching known attack attempts. Is the user account behaving normally or has it been compromised. This catches the obvious stuff early. It’s fast because it’s pattern-matching, not deep analysis.

Layer Two: Input Validation. The request arrives. Now it gets picked apart. Are the parameters what we expect. Can they contain injection attacks. Do they try to access resources the user shouldn’t access. This is the bouncer at the door, but a professional one. It checks credentials and ID. It looks for fake documents.

Layer Three: Policy Enforcement. An agent wants to do something. Call an API. Read a database. Modify a file. This layer asks: is this action allowed right now under these conditions. Deny-by-default policies evaluate every request against explicit rules. If the policy says no, nothing passes through. Hard no.

Layer Four: Capability Tokens. Even if a policy allows the action, the agent needs a cryptographic token to execute it. Each token is signed, single-use, time-bound, and scope-limited. If someone replays the token or tries to use it for a different action, it fails. Every token works exactly once.

Layer Five: Sandboxed Execution. The action runs inside isolated execution boundaries. Deny-by-default rules control what the agent can reach. Filesystem, network, and process isolation limit the blast radius. If the agent tries something outside its boundaries, it stops.

Layer Six: Human-in-the-Loop. High-risk operations pause for human approval. The approval is cryptographically bound to the specific request, which prevents tampering between the time you approve it and the time it executes. Configurable risk thresholds determine what requires a human decision.

Layer Seven: Receipt Chain. Everything that happened gets logged. Every deny decision. Every approval. Every boundary crossing. Cryptographically linked in a tamper-evident chain. Months later, an auditor asks “can you show me every time someone accessed medical records.” You can. Because every layer contributed to a structured record that cannot be rewritten retroactively.

The reason all seven matter is that they catch different attacks. A sophisticated attacker might get past threat intelligence. They might have a fresh IP address and unknown payload. Layer two stops them if they use injection. They get past injection. Layer three stops them if the policy denies the action. They get past the policy. Layer four stops them because their token doesn’t match or has expired. If somehow they get through all of that, layer five sandboxes the execution so the blast radius is contained.

No single layer is enough because attackers are smart and creative. They’ll find the flaw in your thinking. They’ll exploit the assumption you made. The only defense is to make multiple assumptions and check them all independently.

We think this matters because we watch teams implement one or two layers and feel secure. They have input validation. They have an API key check. They think they’re protected. Then someone publishes a prompt injection technique and suddenly all that validation doesn’t matter because nobody was checking what the agent actually outputs. The layer they skipped was the one that matters right now.

Defense in depth is harder to architect. It’s easier to have one bouncer. It’s harder to have seven checkpoints. But every layer compounds. The probability of getting through seven layers is the product of getting through each one. If each layer stops 90% of attacks, seven layers stop 99.9999%.

These seven layers ship today in GuardClaw. Runtime security for AI agents — deterministic, local-first, auditable. Get started →

This connects to the next post: Why We Don’t Use AI to Make Security Decisions. Because most of these layers need to be deterministic. They need to be hard rules. A probabilistic layer is a layer that can be persuaded, and a persuadable security layer isn’t a layer at all.

References:

National Transportation Safety Board (NTSB). (2020). “Boeing 737 MAX crashes investigation reports.” NTSB.gov. [How single-point failures cascade in safety-critical systems]
Schneier, B. (2000). “Secrets and Lies: Digital Security in a Networked World.” Wiley. [Defense in depth architectural principles]
NIST SP 800-53. (2024). “Security and Privacy Controls for Federal Information and Information Systems.” [Multi-layer security control frameworks]

Frequently asked questions

What are the layers of AI agent security?

Strong agent security uses defense in depth, meaning several independent layers that each catch what the one before it missed. GuardClaw runs 7 deterministic layers covering input, policy, identity, tool access, output, and audit, so a single failure does not cascade into a breach. The point of multiple layers is that no one check has to be perfect.

Is input and output filtering enough to secure an AI agent?

No. Most agent setups use one or two layers, usually input filtering and maybe an output check, which leaves the middle of the system unguarded. An agent that gets past the input filter can still call the wrong tool, read the wrong data, or take an unsafe action. That is why GuardClaw enforces policy and identity and tool access as their own layers, not just the edges.

How many detection patterns does GuardClaw use?

GuardClaw ships 1,564 deterministic detection patterns across its 7 layers, with no language model in the security path, so the same input always produces the same verdict. It runs locally in your own infrastructure, so the check happens on your machine and no agent data leaves your environment.

Cite this post

Take Interest Inc. (2026). Seven Layers of Defense for AI Agents. TAKE INTEREST. https://takeinterest.ai/blog/7-layers-of-defense

Take it with you

Save the link to come back to it, or pass it along.

The Detection Engine: How It Works

How GuardClaw checks 1,000+ patterns in under a millisecond with tiered filters, RE2 regex, and anomaly detection.

How GuardClaw Is Different

There are other approaches to AI agent security. Here's where GuardClaw fits, what trade-offs we made, and why we made them.

Getting Started with GuardClaw

A step-by-step walkthrough of setting up GuardClaw, your first security layer for AI agents. From install to your first security report in five minutes.

Back to blog

Frequently asked questions

What are the layers of AI agent security?

Is input and output filtering enough to secure an AI agent?

How many detection patterns does GuardClaw use?

Seven Layers of Defense for AI Agents

Related interests