Table of Contents
The Real Architecture of AI Agents
Part 1 of 10: Anatomy of a Production AI Agent
AI agents are not chatbots with tools.
They are execution runtimes that continuously translate untrusted model output into real-world side effects — file writes, shell commands, network requests, spawned subprocesses. That loop — not the model — is the attack surface.
We reverse-engineered the production source code of Claude Code, Anthropic's official CLI agent and arguably the most widely-used AI coding agent deployed today. The code ships inside the @anthropic-ai/claude-code npm package (v2.1.88), and we extracted it via source maps to reconstruct the original TypeScript.
What we found is not a chatbot. It is a loop-based operating system:
- A scheduler — concurrent tool orchestration with read/write partitioning
- Memory management — a 4-layer compaction pipeline rewriting context every turn
- A syscall layer — 40+ tools acting as the interface between AI intent and system resources
- A permission model — multi-source policy evaluation with classifier-assisted auto-approval
- A recovery system — escalation, fallback, and retry across three failure domains
Everything in this article proves that framing. No theory. No marketing. Line-number citations from the actual code.
The Fundamental Risk
Before we go deeper, name the thing clearly.
The model output is treated as executable intent. Every tool call — every file write, every shell command, every spawned subagent — originates from untrusted text generated by a probabilistic system. The entire architecture exists to make that safe. It does not eliminate the risk. It manages it.
Every iteration of the loop converts tokens into actions. The model is not "thinking." It is issuing instructions inside a control loop. Understanding the loop means understanding the risk.
1. The Boot Sequence
Before a single token is generated, Claude Code runs a multi-stage initialization pipeline.
The entry point is cli.js, which chains through entrypoints/cli.tsx into main.tsx. The first thing main.tsx does is fire three parallel prefetch operations before any other imports evaluate:
// main.tsx:9-20
profileCheckpoint('main_tsx_entry');
startMdmRawRead(); // MDM policy via plutil/reg query
startKeychainPrefetch(); // macOS keychain reads (OAuth + legacy API key)
These are side-effects at module import time. The MDM read fires subprocesses to pull enterprise policy settings. The keychain prefetch fires both macOS keychain reads in parallel — the alternative, sequential synchronous spawns, costs ~65ms on every macOS startup (per source comments at main.tsx:7-8).
After imports resolve, the startup continues with GrowthBook feature flag initialization, remote managed settings, policy limits, and MCP server prefetching. The trust dialog gate in interactiveHelpers.tsx verifies user acceptance before allowing code execution.
Before the agent processes its first prompt, it has already:
- Read enterprise MDM policies
- Accessed the OS keychain for credentials
- Initialized a feature flag system
- Loaded remote managed settings
- Prefetched MCP server configurations
- Enforced policy limits
- Verified user trust acceptance
This is not a script. This is a platform with a boot sequence.
Security implication: The boot sequence runs with full OS-level access before any user interaction. Enterprise policy, keychain credentials, and remote settings are all loaded implicitly. An attacker who can influence MDM policies, MCP configurations, or remote settings can shape agent behavior before the user even sees a prompt.
2. The Query Loop — Heart of the Agent
Every agent has a core loop. In Claude Code, that loop lives in query.ts and it is the single most important piece of code in the system.
The Infinite Loop
At query.ts:307, there is a while (true) loop. Every iteration: send messages to the API, receive a streaming response, execute requested tools, append results, repeat.
The loop is implemented as an async function* generator (query.ts:241), yielding events in real-time. Every assistant message, every tool result, every error flows out through yield statements as they happen.
The State Object
The loop carries mutable state across iterations (query.ts:268-279):
let state: State = {
messages: params.messages,
toolUseContext: params.toolUseContext,
maxOutputTokensOverride: params.maxOutputTokensOverride,
autoCompactTracking: undefined,
stopHookActive: undefined,
maxOutputTokensRecoveryCount: 0,
hasAttemptedReactiveCompact: false,
turnCount: 1,
pendingToolUseSummary: undefined,
transition: undefined,
}
The messages array is the conversation history. The toolUseContext carries permissions, app state, abort controllers, and file state caches. The transition field records why the previous iteration continued — which recovery path fired.
The 4-Layer Context Compaction Pipeline
Context management is where agents diverge most from chatbot architectures. Claude Code implements a four-layer compaction cascade on every loop iteration:
Layer 1: Tool Result Budget (query.ts:379-394). Aggregate tool result sizes are capped. Oversized content is replaced with references, persisted to disk for session resume. This is bandwidth control — it limits how much data any single tool execution can inject into context, constraining exfiltration throughput.
Layer 2: Snip Compaction (query.ts:401-410). Old conversation segments are removed. The tokensFreed count is plumbed downstream so the autocompact threshold check reflects what snip removed. Every snip destroys historical context. An attacker who can trigger aggressive snipping can erase evidence of earlier interactions.
Layer 3: Microcompaction (query.ts:414-426). Individual messages are compressed using cached strategies. Supports "cached microcompact" that edits the prompt cache directly. Compression rewrites conversation history — the model sees a summary, not the original. Injected content that survives summarization persists; content the summarizer drops is gone.
Layer 4: Autocompaction (query.ts:454-467). The nuclear option. An entirely separate agent summarizes the full conversation, replacing the message array. This is a full API call with its own cost and failure modes. This is model-generated memory. The summarizing model decides what to keep. Adversarial content designed to be "memorable" can survive compaction while legitimate context is dropped.
Between layers 2 and 4, an optional Context Collapse system (query.ts:440-447) archives messages and projects them as a read-time view, avoiding permanent discard.
Calling the API
The API call happens at query.ts:659-708. Messages are prepended with user context, wrapped with the system prompt, and sent with configuration including the current model, thinking config, tool definitions, fast mode, fallback model, MCP tool state, effort value, and task budget parameters.
The model can change mid-session based on permission mode and token pressure.
Error Recovery
Three recovery paths, and they compose:
- Prompt too long: Context collapse drain, reactive compaction, or blocking limit error (
query.ts:637-647). - Max output tokens: Escalation from default to 64K (
ESCALATED_MAX_TOKENSinutils/context.ts:25), retrying up to 3 times (MAX_OUTPUT_TOKENS_RECOVERY_LIMITatquery.ts:164). - Model fallback: Primary model fails during streaming — discard partial results, tombstone orphaned messages, create a fresh
StreamingToolExecutor, retry with fallback model (query.ts:709-741).
A single iteration can hit max output tokens, escalate, retry, hit prompt-too-long on the retry, compact, and then succeed. The transition field tracks which path fired.
Security implication: Recovery paths create secondary execution flows. Each retry, fallback, and compaction is a code path that must be secured independently. An attacker who can reliably trigger prompt-too-long conditions forces the agent through compaction — rewriting context under adversary influence.
3. The Tool Execution Model
Tools are what make an agent an agent rather than a chatbot. Claude Code's tool system is a concurrent execution engine with scheduling, permission gating, and real-time streaming.
Concurrency: Read-Only Parallel, Write Serial
The orchestration system in toolOrchestration.ts partitions tool calls into batches. Consecutive concurrency-safe (read-only) tools execute in parallel, up to 10 (toolOrchestration.ts:10):
parseInt(process.env.CLAUDE_CODE_MAX_TOOL_USE_CONCURRENCY || '', 10) || 10
Non-concurrency-safe (write) tools run serially with exclusive access. If isConcurrencySafe throws, the tool is treated as non-concurrent (toolOrchestration.ts:101-107). Conservative default.
The Streaming Tool Executor
The StreamingToolExecutor (StreamingToolExecutor.ts:40) begins executing tools while the model is still streaming its response. As each tool_use block arrives from the API stream, it is immediately queued:
API Stream: [text...] [tool_use_1] [text...] [tool_use_2] [tool_use_3]
| | |
start exec start exec queue (wait)
The executor maintains a queue of TrackedTool objects with statuses: queued, executing, completed, yielded (StreamingToolExecutor.ts:19). When a Bash tool errors, a child abort controller fires to kill sibling subprocesses (StreamingToolExecutor.ts:46-48), but does NOT abort the parent — the query loop continues.
Security implication: Tool execution begins before the full model response is available. This creates a race condition window where partially generated intent can trigger side effects before the model "finishes thinking." A carefully crafted context could cause the model to emit a dangerous tool call early in its response, executing before subsequent reasoning would have prevented it.
The Permission Check Cascade
Before any tool executes, it passes through a multi-layer cascade:
- Always-allow rules: Per-source rules that auto-approve
- Always-deny rules: Hard blocks, cannot be overridden
- Always-ask rules: Force user prompt regardless
- Permission mode:
default,plan,auto, orbypassPermissions - Automated checks: Classifier and hooks can pre-decide
- User prompt: The final fallback — ask the human
Security implication: This is a multi-layer policy system with competing sources of truth — a classic setup for precedence bugs and policy bypass. Seven different rule sources feed into the decision. The "correct" answer depends on which source wins, and the precedence chain is complex enough that edge cases are inevitable (see Article 4).
4. Context Management — The Hidden Complexity
The SYSTEM_PROMPT_DYNAMIC_BOUNDARY
The system prompt is split at a marker (prompts.ts:114-115):
[Static sections] <-- Cached by API (identical across all users)
__SYSTEM_PROMPT_DYNAMIC_BOUNDARY__
[Dynamic sections] <-- Recomputed each turn (memory, MCP, env info)
Static content — identity, tool instructions, coding guidelines — benefits from prompt caching across all users. Dynamic content — CLAUDE.md, MCP server instructions, environment info — can change without busting the cache.
The dynamic sections use a registry pattern (prompts.ts:491-555) separating cached sections from DANGEROUS_uncachedSystemPromptSection sections that recompute every turn. The function name is unusually honest: it marks injection points where dynamic content enters the prompt.
Context Injection
System context (context.ts:116-150): Git status (memoized at startup — a snapshot, not live), optional cache-breaking injection.
User context (context.ts:155-189): CLAUDE.md file contents and current date. CLAUDE.md loading is disabled in bare mode when no explicit --add-dir was provided. The loaded content is cached separately for the auto-mode classifier to read without creating an import cycle.
Security implication: Every dynamic injection point — CLAUDE.md, MCP instructions, memory — is a surface where untrusted content enters the system prompt. These are not bugs. They are architectural decisions that trade functionality for attack surface (see Article 2).
Maintaining Coherence
In long conversations (100+ turns), coherence survives through:
- Compaction summaries that preserve key facts, decisions, and file paths
pendingToolUseSummary— post-hoc summaries of tool use blocks injected into subsequent iterations- Memory prefetch (
query.ts:301-304) — a side-query at the start of each user turn to find relevant memories - Session persistence — transcripts written to disk for
/resumeacross process restarts
5. What This Means for Agent Security
The Loop Is the Attack Surface
The while (true) loop at query.ts:307 is a trust boundary. On one side: the system prompt, the user's instructions, the tool definitions. On the other: the model's response — untrusted content that drives tool execution. The model decides which tools to call, with what arguments, in what order. The permission system mediates, but the loop itself will execute whatever the model requests, constrained only by the permission cascade and each tool's validation.
Every iteration converts tokens into actions. The compaction pipeline rewrites context. The streaming executor races ahead of model completion. The recovery system creates secondary execution paths. Each of these is a surface.
Tool Execution Creates Real Side Effects
When the model calls BashTool, it runs shell commands on the host. When it calls FileWriteTool, it writes to the filesystem. When it calls AgentTool, it spawns an entirely new agent with its own query loop, context, and tool access. These are not sandbox operations by default — they are real side effects on real systems, controlled by AI judgment and gated by a permission model that, in auto mode, can approve operations without human review.
What Comes Next
This article mapped the architecture. The remaining nine parts of this series dissect every layer:
- Part 2: System Prompts Are Not Strings — They're Pipelines — How multi-section, cached/uncached prompt assembly creates a programmable control plane
- Part 3: The AI Agent Attack Surface: Plugins, MCP, and Hooks — The three escalation planes that extend the agent beyond its built-in tools
- Part 4: Why AI Permission Systems Are the New Kernel Security — Mapping the permission model to OS security concepts
- Part 5: Inside a Real AI Guardrail System (And Where It Breaks) — Six layers of defense and six failure modes
- Part 6: Subagents Are Just Sandboxed Processes (And They Leak) — How child agents inherit more than intended
- Part 7: Multi-Surface Prompt Injection in Agent Systems — Five injection surfaces in one product
- Part 8: Agent Security Is the New Cloud Security — Why CISOs should apply cloud security thinking to agents
- Part 9: The Agent Security Top 10 — An evidence-based risk taxonomy from production code
- Part 10: Why You Don't Know How Many AI Agents Are Running — The visibility gap enterprises must close
Conclusion
The most important takeaway from reading this codebase: the attack surface is not the model. The attack surface is the loop.
The model is a prediction engine. The loop is what gives those predictions teeth — scheduling execution, managing memory, enforcing permissions, recovering from failures, and spawning child processes. Every security question about AI agents is ultimately a question about the loop.
If your security team evaluates AI agents using the mental model of "it's just an API wrapper," you are missing the architecture that determines your risk.
Scott Thornton is an AI security researcher at perfecXion.ai, specializing in defensive research on LLM and agent vulnerabilities. All analysis was conducted on lawfully obtained, publicly distributed npm package code in an authorized research environment.