TechnicalPart 13 of GuardClaw in Practice

The Detection Engine: How It Works

If you are working on agent infrastructure and guardclaw, this is for you.

Take Interest Inc.March 22, 20267 min readLast reviewed 2026-03-22

guardclawarchitecturedetection-enginetechnical

Table of contents

Key takeaway

Four detection tiers run in sequence: fast probabilistic check, exact string matching, pattern regex, then behavioral analysis. Most actions clear tier 1 in microseconds.

Key takeaway

No AI in the detection path. Every decision is deterministic, same input, same result, every time. Auditable and predictable.

Key takeaway

The engine normalizes Unicode, decodes Base64, and resolves variable splitting before pattern matching. Attackers can't hide behind encoding.

When your agent runs a shell command, GuardClaw needs to decide whether it’s safe. It has to check the command against over 1,000 known attack patterns. And it has to do it fast enough that you don’t notice.

Checking 1,000 patterns one by one would take too long. Checking them all at once requires a clever architecture. Here’s how it works.

GuardClaw’s detection engine uses four tiers of pattern matching, each faster or more precise than the last. Most safe actions clear the first tier in microseconds and never touch the deeper checks.

The speed problem

Your agent might make 50 tool calls per minute during an active session. Each one passes through GuardClaw. If the security check adds 100 milliseconds, that’s 5 seconds of overhead per minute, noticeable but tolerable. If it adds 1 second, the agent feels sluggish. If it adds 10 seconds, people disable it.

GuardClaw’s target is sub-millisecond. For most actions, it hits that.

The trick is not checking everything against everything. Instead, the engine uses increasingly precise checks, and most actions exit early.

Tier 1: Bloom filter (microseconds)

The first check is a Bloom filter, a data structure that can tell you “definitely not in the set” or “probably in the set,” but never gives a false negative.

GuardClaw loads all 1,000+ known threat indicators into a Bloom filter at startup. When an action arrives, the engine hashes key features of the action and checks the Bloom filter. If the filter says “not in the set,” the action is clean. Move on. This takes microseconds.

If the filter says “probably in the set,” the action moves to the next tier for a precise check. Bloom filters occasionally produce false positives (saying “probably yes” when the answer is actually no), but they never produce false negatives (they never say “no” when the answer is “yes”). This means tier 1 never lets a threat through, it just sometimes sends a clean action to tier 2 for a second opinion.

In practice, about 95% of safe actions clear tier 1 and never go further.

Tier 2: Aho-Corasick (microseconds to low milliseconds)

For actions that pass tier 1’s probabilistic check, tier 2 does exact string matching. The Aho-Corasick algorithm searches for multiple strings simultaneously in a single pass through the input.

Instead of checking “does this contain string A? does it contain string B? does it contain string C?” (which takes O(n × m) time), Aho-Corasick builds a state machine that checks all strings in one pass (O(n) time. N is the length of the input).

GuardClaw uses this tier for known dangerous strings, specific command names, specific domain names, specific file paths that should never appear in agent actions. If an exact match is found, the action is blocked.

Tier 3: RE2 regex (low milliseconds)

For actions that need pattern matching rather than exact string matching, tier 3 uses RE2, Google’s regex engine designed for safe, predictable performance.

Why RE2 specifically? Regular expressions can be exploited for denial-of-service attacks (a technique called ReDoS) where a crafted input causes the regex engine to take exponential time. RE2 guarantees linear time execution, no matter what the input looks like, the check completes in predictable time.

GuardClaw has 1,743 compiled RE2 patterns covering:

Prompt injection (564 patterns): Instruction overrides, role manipulation, context escaping
Blocked tools (275 patterns): Known dangerous MCP tools and skill signatures
Command injection (211 patterns): Shell escapes, pipe chains, encoded commands
SQL injection (139 patterns): Query manipulation, UNION attacks, comment injection
SSRF (85 patterns): Internal network access, metadata endpoint abuse
XSS (80 patterns): Script injection, event handler abuse
Path traversal (75 patterns): Directory escape, symlink abuse
Header injection (75 patterns): HTTP header manipulation
PII detection (71 patterns): Personal data patterns (emails, phone numbers, card numbers)
Other categories: URL exfiltration, output consistency, encoding evasion

These patterns are compiled once at startup. Runtime matching is fast.

Tier 4: Anomaly detection (milliseconds)

The final tier doesn’t look at the content of an action, it looks at the behavior pattern. Is this action unusual for this agent?

Anomaly detection tracks:

Rate: Is the agent making actions faster than normal? A sudden spike in shell commands might indicate an injection that’s causing the agent to loop.
Sequence: Is this a normal sequence of actions? An agent that always reads files then writes files, suddenly making network calls, is behaving differently.
Resource access: Is the agent accessing resources it hasn’t touched before? First-time access to a sensitive directory gets a higher anomaly score.

Anomaly detection produces a score, not a binary allow/deny. The score feeds into the policy engine, which decides based on configured thresholds whether to allow, flag, or block.

Before pattern matching: normalization

Attackers know about pattern matching. They try to evade it by encoding their payloads. A command injection hidden in Base64. A file path built from variable concatenation. A domain name written in Unicode characters that look like ASCII but aren’t.

Before any tier runs, GuardClaw normalizes the input:

Unicode normalization (NFKC): Converts look-alike characters (Cyrillic “а” that looks like Latin “a”) to their canonical forms
Zero-width character stripping: Removes invisible characters that could split keywords
Base64 decoding: If the input contains Base64-encoded data, it’s decoded and the decoded content is also checked
Variable expansion: Patterns like g="guard"; c="claw"; $g$c are expanded to guardclaw
HTML concealment detection: Content hidden in HTML tags or attributes is extracted

The normalized input is what gets checked against all four tiers. You can’t hide an attack behind encoding.

Why not use AI for detection?

The question comes up. Why not train a model to detect threats instead of maintaining 1,700+ patterns?

Three reasons:

Predictability. A pattern either matches or it doesn’t. The same input produces the same result every time. An AI model might flag something today and miss it tomorrow because of a minor change in the input distribution.
Auditability. When GuardClaw blocks an action, it tells you exactly which pattern matched. You can read the pattern, understand why it triggered, and decide whether it’s correct. With a model, the answer is “the model scored this as 0.87 threat probability.” That’s harder to audit.
Speed. Pattern matching at sub-millisecond latency is achievable with compiled patterns. Model inference, even optimized, adds meaningful latency that compounds across hundreds of actions per session.

We wrote more about this design choice in Why We Don’t Use AI to Make Security Decisions.

What to take from this

You don’t need to understand the internals to use GuardClaw. The detection engine is a black box from the outside — actions go in, decisions come out.

But knowing how it works helps you understand the trade-offs. The tiered architecture means most actions are checked in microseconds. The normalization pipeline means encoding tricks don’t work. And the deterministic approach means every decision is explainable and auditable.

Next post: setting up alerts and monitoring — how to know when something important happens without staring at the dashboard all day.

Cite this post

Take Interest Inc. (2026). The Detection Engine: How It Works. TAKE INTEREST. https://takeinterest.ai/blog/the-detection-engine-how-it-works

Take it with you

Save the link to come back to it, or pass it along.

Seven Layers of Defense for AI Agents

Most agent security stops at input filtering and output checks. Here is what real defense in depth looks like for agent systems.

How GuardClaw Is Different

There are other approaches to AI agent security. Here's where GuardClaw fits, what trade-offs we made, and why we made them.

Getting Started with GuardClaw

A step-by-step walkthrough of setting up GuardClaw, your first security layer for AI agents. From install to your first security report in five minutes.

Back to blog

The speed problem

Tier 1: Bloom filter (microseconds)

Tier 2: Aho-Corasick (microseconds to low milliseconds)

Tier 3: RE2 regex (low milliseconds)

Tier 4: Anomaly detection (milliseconds)

Before pattern matching: normalization

Why not use AI for detection?

What to take from this

The Detection Engine: How It Works

Related interests