TechnicalPart 8 of The Builder's Guide to Agent Security

Why AI Does Not Make Security Decisions

If you are working on agent infrastructure and deterministic security, this is for you.

Take Interest Inc.February 23, 20265 min readLast reviewed 2026-02-23

deterministic-securityllm-guardrailspolicy-enforcement

Table of contents

Decision Authority Matrix

Match decision frequency and consequence severity to the appropriate control approach.

Key takeaway

If your deny/allow logic runs through a model that can be persuaded, your security boundary is a suggestion, not a wall.

Key takeaway

Use LLMs for detection hints and triage. Use deterministic logic for enforcement. Match the tool to the consequence.

Key takeaway

Audit every deny path in your system. If any high-consequence decision is probabilistic, fix it this week.

“Wait. You’re an AI security company that doesn’t use AI for security decisions.”

Same reason you don’t use a coin flip to decide whether to lock your front door.

Summary: Probabilistic models are fantastic for suggesting what might be happening. They’re terrible for deciding what’s allowed to happen. We use LLMs to hint at problems and help humans triage. We use deterministic rules for deny paths. Match the tool to the consequence.

The appeal is obvious. LLMs are good at language. Security violations are patterns in language. So naturally you’d think, use the LLM as the security guard. Feed it a request. Have it say yes or no. That sounds efficient.

But here’s the problem. An LLM saying no can be negotiated with. It’s a probabilistic output. It can be persuaded. Change the prompt slightly. Try again. Add context. A human can test variations until the model says yes. We’ve watched this happen in adversarial research. Consistent bypass rates. The model was designed to say no to certain patterns. A creative prompt still got through.

A deterministic rule doesn’t negotiate. Either you have permission or you don’t. Either the policy allows it or it doesn’t. There’s no persuasion path. No variation that makes the rule reconsider.

This is a Human-AI parallel moment. Think about your own decision-making. “Should I try that new restaurant?” You use gut feel. Intuition. Probabilistic. You might be wrong. That’s okay. The consequence is a bad dinner. “Should I cross the street with a truck coming?” Hard rule. Zero tolerance. You don’t negotiate with physics. We naturally match decision mechanism to consequence.

We think security decisions are in the second category.

Here’s where people get confused. LLMs are incredible for detection. You want to know if a request looks suspicious. Feed it to a model. Get back a likelihood score. Use that to triage. Flag it for human review. That’s the right job. Detection is about suggesting what might be happening so humans can investigate.

Enforcement is different. Enforcement is about what actually happens. Can this request proceed. Can this tool be called. Can this file be accessed. If your answer to those questions comes from a probabilistic model, your security boundary is just a suggestion wearing a uniform.

We’ve watched teams build LLM-based guardrails. “We trained a model to detect jailbreaks.” Great. Now what. If the model says it’s a jailbreak, you block it. If the model says it’s not, you allow it. But the model was trained on examples. Attackers are creative. They try things the training set didn’t include. The model sees something novel and says “looks fine to me” and the request goes through.

It’s not the model’s fault. That’s just how probabilistic systems work. They’re based on patterns they’ve seen. Novel attacks are, by definition, patterns they haven’t seen.

The fix is to split the job. Use LLMs for triage. Structured scoring. “This request has characteristics of a prompt injection attempt.” That’s helpful. That reduces the noise for the humans who actually make the deny decision. But the deny decision itself comes from deterministic policy. The policy asks: Is the user allowed to call this function with these parameters. Is this operation allowed in this context. Those are rules, not probabilities.

Let’s say a user asks an agent to delete their account. The agent has permission to do that. The policy says it’s allowed. But the request came from an IP address the user has never used before at 3 AM on a Sunday. A probabilistic model would maybe flag that as suspicious. A deterministic anomaly check would compare it against baseline. If the deviation is beyond the threshold, the policy gets more restrictive. Maybe require additional verification. But the decision rule is still: “If anomaly score exceeds X, require Y.”

The rule is measurable. Testable. Auditable. You can explain to a regulator exactly why the system made this decision. You can’t do that with a neural network hidden state.

This is where we differ from a lot of AI-first security approaches. They want to use AI all the way through. We use AI where it’s good: detection, scoring, triage, suggesting patterns. We use deterministic logic where it matters: enforcement, policy, hard boundaries.

A reasonable objection: “But won’t your deterministic rules get stale.” Yes. That’s why you audit them quarterly. You look at the patterns of actual attacks. You update the rules. You test them. You measure performance. This is active maintenance, not set-it-and-forget-it. But it’s better than trusting a frozen model to adapt to novel attacks.

Another common pushback: “But LLM-based guardrails are more flexible.” They are. Flexible is great when flexibility is good. Flexible is terrible when you need a boundary. A lock doesn’t need to be flexible. It needs to work.

Here’s what we recommend: Audit every deny path in your system right now. For each one, ask: is this decision made by a deterministic rule or a probabilistic model. If it’s high consequence (blocks a user, denies access, triggers an escalation) and it’s probabilistic, fix it this week. Move the probability score into a triage signal, not an enforcement decision.

Low-consequence paths can be more flexible. Suggestions. Recommendations. “You might want to review this.” Great, use a model. High-consequence paths need hard rules. “This is not allowed.” That comes from policy.

This is why GuardClaw is deterministic. Policy rules, not model inference. Every security decision is auditable and repeatable. See the architecture →

This connects back to Seven Layers of Defense. Because most of those layers need to be deterministic. Layer 4, the policy engine, is definitely not running LLM inference. Layer 2, input validation, is pattern matching, not model scoring. The layers that can use probabilistic models are detection and triage. Not enforcement.

References:

Carlini, N., et al. (2023). “Extracting Training Data from Large Language Models.” USENIX Security Symposium. [How probabilistic models can be extracted and compromised]
Perez, F., & Ribeiro, I. (2022). “Ignore Previous Prompt: Attack Techniques For Language Models.” arXiv:2211.09527. [Analysis of LLM jailbreak success rates]
Seshia, S. A., et al. (2022). “Towards Trustworthy AI Systems.” CACM 65(12). [Formal verification vs. probabilistic boundaries]

Cite this post

Take Interest Inc. (2026). Why AI Does Not Make Security Decisions. TAKE INTEREST. https://takeinterest.ai/blog/why-we-dont-use-ai-for-security-decisions

Take it with you

Save the link to come back to it, or pass it along.

Seven Layers of Defense for AI Agents

Most agent security stops at input filtering and output checks. Here is what real defense in depth looks like for agent systems.

Audit Your Agent's Trust Boundaries This Week

A practical guide to mapping and testing every trust assumption your AI agents make, from network access to credential scope to tool permissions.

One Firebase Misconfig Leaked 300M Chat Messages

An AI chat app with 50M users left a Firebase database open. A researcher found 300 million messages from 25 million people.

Back to blog

Why AI Does Not Make Security Decisions

Related interests