Operations Part 10 of The Builder's Guide to Agent Security

Score Yourself: The Operator Readiness Assessment

Take Interest Inc. · February 24, 2026 · 5 min read

Field Guide

Score Yourself: The Operator Readiness Assessment

In video games you can see your stats. In agent security, most teams have no idea where they stand. A 15-minute self-assessment across five dimensions tells you exactly what to fix next.

maturity-model assessment operations

Readiness Radar

5-axis assessment of security readiness (0-3 scale). Click any axis for score descriptions.

Key takeaway

Five dimensions: Identity & Access, Input/Output Controls, Monitoring, Incident Response, Policy & Governance.

Key takeaway

Score 0-3 per dimension. 0 = not started. 1 = ad hoc. 2 = documented. 3 = automated and tested.

Key takeaway

The shape of your radar chart tells you more than the total score. Spikes and valleys show your real risk profile.

Video games let you see your stats. Health bar. Armor. Skill tree. You know exactly where you’re strong and weak. In agent security, most teams are playing blind.

A 15-minute self-assessment across five dimensions gives you a number you can actually use. Not a compliance checkmark. Not a marketing score. A real measurement of where you stand and what breaks first.

You’ll score yourself 0-3 on Identity & Access, Input/Output Controls, Monitoring, Incident Response, and Policy & Governance. The shape of your radar chart matters more than the total. Spikes and valleys show your actual risk profile. Then you share it with your team and everyone knows what the next sprint should be.

The Five Dimensions

Identity & Access Control

This is where your agents prove who they are and what they’re allowed to do.

Score 0: You don’t know what credentials your agents use. They might be hardcoded. You’re not rotating anything. Authentication is optional in some places.

Score 1: Your agents have credentials. You know most of them. Rotation happens sometimes. You’ve thought about least-privilege but haven’t enforced it everywhere.

Score 2: Every agent has a documented identity. You rotate credentials on a schedule. You’ve mapped least-privilege for major agents. You have a process for adding and removing access. It’s not automated but it’s documented and followed.

Score 3: Identity is automated. New agents get provisioned with the minimum permissions they need. Credentials rotate automatically. You audit access quarterly and revoke anything unused. Your system blocks over-privileged requests.

Input/Output Controls

This is your firewall for agent behavior. What can it receive? What can it send?

Score 0: Agents accept whatever input they get. There’s minimal validation. An agent might send output anywhere. You’re not filtering or rate-limiting anything.

Score 1: You’ve thought about input validation. Some agents have basic checks. You’ve identified your riskiest outputs (email, database writes, external APIs) and thought about controlling them. Not yet implemented everywhere.

Score 2: Input validation is documented and enforced on high-risk agents. Your top 5 dangerous actions require explicit approval or have rate limits. Output destinations are restricted. You log rejections.

Score 3: All agent inputs are validated against a schema. High-risk actions require human confirmation or are denied by default. Outputs are filtered before leaving your system. Rate limits are enforced. Your system automatically rejects malicious patterns. You log everything.

Monitoring & Observability

You can’t defend what you can’t see.

Score 0: You don’t have structured logs of agent activity. When something goes wrong, you find out from users. You have no alerts set up.

Score 1: Your agents generate logs. You can search them if you know where to look. You have a vague sense of what’s normal. No alerts. You’re not monitoring for anomalies.

Score 2: Agent activity is logged to a central system (Datadog, Splunk, etc.). You have a baseline for normal behavior. Alerts are set up for obvious problems (repeated failures, repeated denied actions). You review logs weekly.

Score 3: Structured logging captures every agent action with timestamp, decision, and result. You have anomaly detection running. Your baseline is updated weekly. You alert on deviations in real time. You run monthly log reviews with your team. You can trace any user request through your entire system.

Incident Response

When something goes wrong, how fast can you stop it and recover?

Score 0: You don’t have an incident plan. If an agent goes rogue, you’ll figure it out as you go. You might take down the whole system to stop it.

Score 1: You’ve thought about what could go wrong. You have a rough idea of who would respond. You could probably rotate credentials if needed but it would take hours. You don’t have a runbook.

Score 2: You have a documented incident response plan. Your critical agents have assigned responders. You know how to revoke credentials, disable an agent, and audit what it did. You can do it in under an hour. You’ve practiced once.

Score 3: You have tested incident playbooks for your critical agents. You’ve run red-team exercises. You can rotate credentials in under 15 minutes. You have a war room process. You can audit agent activity within minutes of the alert. You’ve drilled your recovery plan quarterly.

Policy & Governance

This is the layer that scales as you grow.

Score 0: No documented policies. Decisions about agent access happen ad hoc. Different teams approach security differently. No one knows what they’re supposed to do.

Score 1: You have a rough set of policies. Shared document on Slack. People mostly follow them. Some gaps. Some teams haven’t read them.

Score 2: Your security policies are documented, versioned, and shared with the team. Access reviews happen on a schedule. You have change management for agent deployments. New team members get trained on the policy.

Score 3: Policies are embedded in your systems. Access requests go through a tracked process. Every agent deployment requires a security review. Changes are audited. Policy violations trigger alerts. New team members get hands-on training plus certification.

How to Use This

Grab your team. Go through each dimension. Discuss where you actually are. Not where you want to be. Not where you think you are. Where you actually are.

Score each dimension 0-3. Honest scoring.

Plot your five scores on a radar chart. The shape matters more than the number. If you’re 3-3-0-1-2, you’re overconfident on controls but you’re flying blind on incident response. That’s your actual risk profile. Identity is locked down. Monitoring is basically not happening. When something breaks, you’re in trouble.

Share the radar with your team. Everyone should see it. Discuss: what’s the most dangerous gap?

That’s your next sprint focus. Not the lowest score. The gap that creates the most risk.

Run this assessment quarterly. Watch how your shape changes as you work through the gaps. When everything is 2 or higher, you’ve built a foundation. When everything is 3, you’ve scaled your practice. Most teams stop at 2 and that’s fine. You’ve got enough.

One More Thing

The goal isn’t a perfect score. The goal is knowing where you stand and being honest about it.

Teams that score themselves low but talk about it openly move faster than teams that claim 3s across the board but can’t explain their controls. Self-awareness beats false confidence.

So score yourself. Share it. Use it. Let it guide your next move.

You just took the assessment. Now run your team through it. Questions? Think you’re scoring yourself too high or too low? Go back and audit one agent end-to-end. Walk through its credentials, its inputs, its outputs, your logs, your incident response. Scoring isn’t academic. It’s based on real systems.

Next: What We Got Wrong is where we admit the mistakes we made building this and what we changed because of them.

Join the Intelligence Brief

Threat intelligence, agentic vulnerabilities, and engineering frameworks delivered straight to your inbox.

01 / Threat IntelZero-day vulnerabilities and mitigation strategies.

02 / Red TeamQuarterly teardowns of AI infrastructure.

03 / The BlueprintEngineering local-first deterministic computing.

Back to blog