The Detection Engine: How It Works
Field Guide
The Detection Engine: How It Works
GuardClaw checks 1,000+ patterns in under a millisecond. Here's the tiered architecture that makes that possible — Bloom filters, Aho-Corasick, RE2 regex, and anomaly detection.
Key takeaway
Four detection tiers run in sequence: fast probabilistic check, exact string matching, pattern regex, then behavioral analysis. Most actions clear tier 1 in microseconds.
Key takeaway
No AI in the detection path. Every decision is deterministic — same input, same result, every time. Auditable and predictable.
Key takeaway
The engine normalizes Unicode, decodes Base64, and resolves variable splitting before pattern matching. Attackers can't hide behind encoding.
When your agent runs a shell command, GuardClaw needs to decide whether it’s safe. It has to check the command against over 1,000 known attack patterns. And it has to do it fast enough that you don’t notice.
Checking 1,000 patterns one by one would take too long. Checking them all at once requires a clever architecture. Here’s how it works.
GuardClaw’s detection engine uses four tiers of pattern matching, each faster or more precise than the last. Most safe actions clear the first tier in microseconds and never touch the deeper checks.
The speed problem
Your agent might make 50 tool calls per minute during an active session. Each one passes through GuardClaw. If the security check adds 100 milliseconds, that’s 5 seconds of overhead per minute — noticeable but tolerable. If it adds 1 second, the agent feels sluggish. If it adds 10 seconds, people disable it.
GuardClaw’s target is sub-millisecond. For most actions, it hits that.
The trick is not checking everything against everything. Instead, the engine uses increasingly precise checks, and most actions exit early.
Tier 1: Bloom filter (microseconds)
The first check is a Bloom filter — a data structure that can tell you “definitely not in the set” or “probably in the set,” but never gives a false negative.
GuardClaw loads all 1,000+ known threat indicators into a Bloom filter at startup. When an action arrives, the engine hashes key features of the action and checks the Bloom filter. If the filter says “not in the set,” the action is clean. Move on. This takes microseconds.
If the filter says “probably in the set,” the action moves to the next tier for a precise check. Bloom filters occasionally produce false positives (saying “probably yes” when the answer is actually no), but they never produce false negatives (they never say “no” when the answer is “yes”). This means tier 1 never lets a threat through — it just sometimes sends a clean action to tier 2 for a second opinion.
In practice, about 95% of safe actions clear tier 1 and never go further.
Tier 2: Aho-Corasick (microseconds to low milliseconds)
For actions that pass tier 1’s probabilistic check, tier 2 does exact string matching. The Aho-Corasick algorithm searches for multiple strings simultaneously in a single pass through the input.
Instead of checking “does this contain string A? does it contain string B? does it contain string C?” (which takes O(n × m) time), Aho-Corasick builds a state machine that checks all strings in one pass (O(n) time, where n is the length of the input).
GuardClaw uses this tier for known dangerous strings — specific command names, specific domain names, specific file paths that should never appear in agent actions. If an exact match is found, the action is blocked.
Tier 3: RE2 regex (low milliseconds)
For actions that need pattern matching rather than exact string matching, tier 3 uses RE2 — Google’s regex engine designed for safe, predictable performance.
Why RE2 specifically? Regular expressions can be exploited for denial-of-service attacks (a technique called ReDoS) where a crafted input causes the regex engine to take exponential time. RE2 guarantees linear time execution — no matter what the input looks like, the check completes in predictable time.
GuardClaw has 1,743 compiled RE2 patterns covering:
- Prompt injection (564 patterns): Instruction overrides, role manipulation, context escaping
- Blocked tools (275 patterns): Known dangerous MCP tools and skill signatures
- Command injection (211 patterns): Shell escapes, pipe chains, encoded commands
- SQL injection (139 patterns): Query manipulation, UNION attacks, comment injection
- SSRF (85 patterns): Internal network access, metadata endpoint abuse
- XSS (80 patterns): Script injection, event handler abuse
- Path traversal (75 patterns): Directory escape, symlink abuse
- Header injection (75 patterns): HTTP header manipulation
- PII detection (71 patterns): Personal data patterns (emails, phone numbers, card numbers)
- Other categories: URL exfiltration, output consistency, encoding evasion
These patterns are compiled once at startup. Runtime matching is fast.
Tier 4: Anomaly detection (milliseconds)
The final tier doesn’t look at the content of an action — it looks at the behavior pattern. Is this action unusual for this agent?
Anomaly detection tracks:
- Rate: Is the agent making actions faster than normal? A sudden spike in shell commands might indicate an injection that’s causing the agent to loop.
- Sequence: Is this a normal sequence of actions? An agent that always reads files then writes files, suddenly making network calls, is behaving differently.
- Resource access: Is the agent accessing resources it hasn’t touched before? First-time access to a sensitive directory gets a higher anomaly score.
Anomaly detection produces a score, not a binary allow/deny. The score feeds into the policy engine, which decides based on configured thresholds whether to allow, flag, or block.
Before pattern matching: normalization
Attackers know about pattern matching. They try to evade it by encoding their payloads. A command injection hidden in Base64. A file path built from variable concatenation. A domain name written in Unicode characters that look like ASCII but aren’t.
Before any tier runs, GuardClaw normalizes the input:
- Unicode normalization (NFKC): Converts look-alike characters (Cyrillic “а” that looks like Latin “a”) to their canonical forms
- Zero-width character stripping: Removes invisible characters that could split keywords
- Base64 decoding: If the input contains Base64-encoded data, it’s decoded and the decoded content is also checked
- Variable expansion: Patterns like
g="guard"; c="claw"; $g$care expanded toguardclaw - HTML concealment detection: Content hidden in HTML tags or attributes is extracted
The normalized input is what gets checked against all four tiers. You can’t hide an attack behind encoding.
Why not use AI for detection?
The question comes up. Why not train a model to detect threats instead of maintaining 1,700+ patterns?
Three reasons:
-
Predictability. A pattern either matches or it doesn’t. The same input produces the same result every time. An AI model might flag something today and miss it tomorrow because of a minor change in the input distribution.
-
Auditability. When GuardClaw blocks an action, it tells you exactly which pattern matched. You can read the pattern, understand why it triggered, and decide whether it’s correct. With a model, the answer is “the model scored this as 0.87 threat probability.” That’s harder to audit.
-
Speed. Pattern matching at sub-millisecond latency is achievable with compiled patterns. Model inference, even optimized, adds meaningful latency that compounds across hundreds of actions per session.
We wrote more about this design choice in Why We Don’t Use AI to Make Security Decisions.
What to take from this
You don’t need to understand the internals to use GuardClaw. The detection engine is a black box from the outside — actions go in, decisions come out.
But knowing how it works helps you understand the trade-offs. The tiered architecture means most actions are checked in microseconds. The normalization pipeline means encoding tricks don’t work. And the deterministic approach means every decision is explainable and auditable.
Next post: setting up alerts and monitoring — how to know when something important happens without staring at the dashboard all day.
Join the Intelligence Brief
Threat intelligence, agentic vulnerabilities, and engineering frameworks delivered straight to your inbox.