When organizations test AI runtime security tools, the first instinct is often: "Let's throw a bunch of obviously bad prompts at it and see if it blocks them."
That's not wrong — but it's incomplete. If the only evaluation is whether the tool can block a handful of "bad" prompts, you miss the bigger picture: the real challenge is not blocking obviously malicious requests, but doing so without crushing legitimate, useful, and even security-critical queries.
The Precision Problem: Why Your AI Security Tool Might Be Doing More Harm Than Good
In the race to secure generative AI, the industry has become fixated on a single metric: the ability to stop a malicious prompt. We test our security tools against lists of known jailbreaks and prompt injection techniques, grading them on a simple pass/fail basis. While necessary, this narrow focus on blocking threats has created a dangerous blind spot: the collateral damage caused by false positives.
We are so focused on preventing malicious actors that we've failed to ask a critical question: how many legitimate users are we inadvertently stopping?
⚠️ The Precision Problem: This isn't a minor inconvenience; it's the precision problem, revealing a fundamental flaw in how we evaluate AI security. A tool that cannot distinguish between a malicious user and a curious developer isn't smart security—it's a blunt instrument.
The Anatomy of a "Good" Prompt Gone Wrong
Consider a common scenario. A cybersecurity analyst on your Red Team is tasked with creating realistic phishing email templates for employee training. She prompts her company-sanctioned LLM: "Generate three examples of urgent-sounding phishing emails related to a corporate password policy update."
A basic security tool, scanning for keywords like "phishing" and "password," immediately blocks the request. The tool has done its job according to its rules. But in reality, it has failed spectacularly. It has prevented a security professional from using an approved tool to improve the company's security.
This is the precision problem in action. The tool detects keywords, not context. It fails to recognize that the user's role, intent, and the nature of the task are legitimate.
The Ripple Effect: From Frustration to Shadow AI
What happens next is predictable. The analyst, pressed for time, tries rephrasing her prompt. Blocked again. After several failed attempts, she gives up on the "official" tool. Instead, she opens a new browser tab and uses a public, unsanctioned AI model that lacks such security controls.
🚨 Shadow AI Creation: She gets her work done, but the organization has lost. This marks the start of Shadow AI. By deploying an overly aggressive and imprecise security layer, we unintentionally push our most creative and skilled employees off our secure platforms. The result is a total loss of visibility, logging, and control—creating far more risk than the original prompt ever could.
An effective AI security solution should not be a barricade; it should be an intelligent, adaptable barrier. It must be precise enough to permit legitimate, even edgy, queries from trusted users while effectively inspecting and blocking genuine threats from unknown or untrusted sources.
Not All "Bad-Looking" Prompts Are Truly Malicious
Take a dataset of test prompts. On paper, many look the same: they mention exploits, injections, sensitive data, or attacks. But here's the tricky part: not all of them are actually malicious.
💡 Context Examples:
- A red teamer might ask, "How does a buffer overflow exploit work?" That's not an attack; that's their job.
- A compliance engineer might ask, "How do we handle PCI data in transit?" That's not data leakage; that's compliance.
- A developer might ask, "How do I test my system against prompt injection?" That's not an injection attempt; that's prevention work.
If your runtime tool is so strict that it blocks all of these, you end up with a tool that protects you… by preventing you from doing your work.
Overblocking Is Just as Bad as Underblocking
Think about it. A tool that blocks everything malicious-looking is easy to build. But then, your security team can't query the assistant about attack techniques. Your compliance officers can't ask about regulated data. Your developers can't explore edge cases.
⚠️ Shadow AI Risk: Overblocking creates frustration, slows productivity, and almost guarantees people will find ways to bypass the guardrails. Shadow AI is born this way — when controls are so tight that people stop using the "approved" tool and spin up their own unsanctioned ones. That's when your real security posture weakens.
As shown in the quadrant diagram below, runtime tools often fall into one of three traps — overblocking, underblocking, or flexible but imprecise — with only one quadrant, balanced & configurable, delivering both security and usability.
The Real Evaluation: Precision and Context
So what should an evaluation actually measure?
1. Precision and Recall
- Can the tool block genuine malicious requests (high recall) without flagging legitimate ones (high precision)?
- Both matter. A tool that catches 100% of attacks but blocks 50% of normal work isn't usable.
Take a simple example: imagine a red team engineer asks an AI assistant, "How does a buffer overflow exploit work?" A strict tool might block it as "malicious." But in reality, that engineer is doing their job — testing defenses. This is why context and configurability are non-negotiable in runtime tools.
2. Context Awareness
- Does the tool understand the difference between a student, a security engineer, and a customer support rep asking the same question?
- Without context, you're judging only by the words on the screen, and that's a recipe for false positives.
3. Configurability
- Who decides what's "malicious"? Not the vendor alone.
- A hospital may want to block all self-harm queries. A cybersecurity firm may want those same queries allowed for research purposes.
- The runtime tool should provide policy controls so each organization sets the bar for what's acceptable in their environment.
Who Gets to Decide What's Malicious?
And it's not just about prompts — it's about people. A compliance officer, a developer, and a security engineer all ask very different questions. The stakeholder wheel below makes this clear: runtime security needs to be role-aware, not one-size-fits-all.
This is the heart of it. No vendor can hand down a single, universal definition of "malicious." It's always context-dependent.
🎯 Three-Layer Decision Framework:
- The vendor provides baseline categories (prompt injection, toxic language, PII leakage, etc.).
- The customer decides which of those apply, in what ways, and how strictly.
- The runtime tool enforces that policy, ideally with multiple modes (block, warn, log) so customers can tune the response.
Without this, you're forcing one rigid definition on everyone — and that won't work.

Stakeholder decision framework for AI runtime security policy configuration
Shifting the Mindset: From Gatekeeper to Enabler
To address the precision problem, we need to change our evaluation criteria. Instead of asking "Does it block bad things?", we should ask:
🎯 Next-Generation Evaluation Questions:
- How context-aware is it? Can the tool distinguish between a developer testing for vulnerabilities and an attacker trying to exploit them? Does it integrate with identity systems to understand user roles and permissions?
- How granular are the policies? Can we tailor a policy that allows security teams to research malware while blocking all other employees? Can we impose stricter controls on external-facing applications than internal ones?
- How transparent is its reasoning? When the tool blocks a user, does it give clear, actionable feedback? Or does it offer a frustrating dead-end that prompts users to find workarounds?
True AI runtime security isn't about building the longest blocklist. It's about enabling the business to harness AI's power safely and effectively. The next generation of security will be defined not by what it blocks, but by the productive, innovative work it intelligently enables.
Living in the Grey Area
The truth is, this space is full of grey zones. The same string can be malicious in one context and completely benign in another. That's what makes runtime security harder than people expect.
The most effective tools don't try to erase the grey. Instead, they:
- Score risk (low/medium/high confidence).
- Offer different enforcement modes (log, alert, block).
- Provide audit trails so teams can review ambiguous cases.
- Support continuous tuning so the model adapts to the organization's needs.
The decision layer diagram shows how this works in practice. A prompt passes through the runtime security filter, which can block if malicious, warn if ambiguous, or allow if benign — before safely reaching the AI assistant.

Complete testing framework showing precision, recall, and context evaluation methods
Recommendation for Testing AI Runtime Security Tools
When you're evaluating tools — whether it's your own or a third party — here's the framework I recommend:
If you want a practical way to evaluate your current or third-party runtime tools, start with this simple checklist. These five items capture the balance between blocking threats and enabling work — and they'll save you from evaluating only on the "block list" mindset.
- Don't just test with "obviously bad" prompts. Mix in legitimate prompts that look similar, to see if the tool can tell the difference.
- Test both sides of the curve. How many bad things slip through (false negatives)? How many good things get blocked (false positives)?
- Evaluate configurability. Can you tailor rules to your business context, or is it one-size-fits-all?
- Check reporting and explainability. Can the tool show why it flagged a prompt? Transparency builds trust.
- Think user experience. A tool that frustrates users will be bypassed, and then your runtime protections don't matter.
"The goal isn't to block everything bad — it's to let real work through while keeping threats out."
The True Measure of Runtime Security
The real job of an AI runtime security tool isn't just to block attacks. It's to let people work confidently and safely — filtering out what truly matters, without suffocating legitimate use.
The goal isn't to block everything bad — it's to let real work through while keeping threats out. That's the true measure of runtime security. So when you test, don't just measure "did it block the dataset?" Ask instead: does it help my people do their jobs securely?
That's the balance we should be testing for.
💬 Your Experience: What's been your experience? Have you seen tools overblock or underblock in practice? I'd love to hear your stories and learn from your real-world testing challenges.