The Real Architecture of AI Agents: From Production Code

Introduction
The Fundamental Risk
1. The Boot Sequence
2. The Query Loop — Heart of the Agent
3. The Tool Execution Model
4. Context Management — The Hidden Complexity
5. What This Means for Agent Security
Series Navigation

The Real Architecture of AI Agents

Part 1 of 10: Anatomy of a Production AI Agent

AI agents are not chatbots with tools.

They are execution runtimes that continuously translate untrusted model output into real-world side effects — file writes, shell commands, network requests, spawned subprocesses. That loop — not the model — is the attack surface.

We reverse-engineered the production source code of Claude Code, Anthropic's official CLI agent and arguably the most widely-used AI coding agent deployed today. The code ships inside the @anthropic-ai/claude-code npm package (v2.1.88), and we extracted it via source maps to reconstruct the original TypeScript.

What we found is not a chatbot. It is a loop-based operating system:

A scheduler — concurrent tool orchestration with read/write partitioning
Memory management — a 4-layer compaction pipeline rewriting context every turn
A syscall layer — 40+ tools acting as the interface between AI intent and system resources
A permission model — multi-source policy evaluation with classifier-assisted auto-approval
A recovery system — escalation, fallback, and retry across three failure domains

Everything in this article proves that framing. No theory. No marketing. Line-number citations from the actual code.

The Fundamental Risk

Before we go deeper, name the thing clearly.

The model output is treated as executable intent. Every tool call — every file write, every shell command, every spawned subagent — originates from untrusted text generated by a probabilistic system. The entire architecture exists to make that safe. It does not eliminate the risk. It manages it.

Every iteration of the loop converts tokens into actions. The model is not "thinking." It is issuing instructions inside a control loop. Understanding the loop means understanding the risk.

1. The Boot Sequence

Before a single token is generated, Claude Code runs a multi-stage initialization pipeline.

The entry point is cli.js, which chains through entrypoints/cli.tsx into main.tsx. The first thing main.tsx does is fire three parallel prefetch operations before any other imports evaluate:

// main.tsx:9-20
profileCheckpoint('main_tsx_entry');
startMdmRawRead();          // MDM policy via plutil/reg query
startKeychainPrefetch();     // macOS keychain reads (OAuth + legacy API key)

These are side-effects at module import time. The MDM read fires subprocesses to pull enterprise policy settings. The keychain prefetch fires both macOS keychain reads in parallel — the alternative, sequential synchronous spawns, costs ~65ms on every macOS startup (per source comments at main.tsx:7-8).

After imports resolve, the startup continues with GrowthBook feature flag initialization, remote managed settings, policy limits, and MCP server prefetching. The trust dialog gate in interactiveHelpers.tsx verifies user acceptance before allowing code execution.

Before the agent processes its first prompt, it has already:

Read enterprise MDM policies
Accessed the OS keychain for credentials
Initialized a feature flag system
Loaded remote managed settings
Prefetched MCP server configurations
Enforced policy limits
Verified user trust acceptance

This is not a script. This is a platform with a boot sequence.

Security implication: The boot sequence runs with full OS-level access before any user interaction. Enterprise policy, keychain credentials, and remote settings are all loaded implicitly. An attacker who can influence MDM policies, MCP configurations, or remote settings can shape agent behavior before the user even sees a prompt.

2. The Query Loop — Heart of the Agent

Every agent has a core loop. In Claude Code, that loop lives in query.ts and it is the single most important piece of code in the system.

The Infinite Loop

At query.ts:307, there is a while (true) loop. Every iteration: send messages to the API, receive a streaming response, execute requested tools, append results, repeat.

The loop is implemented as an async function* generator (query.ts:241), yielding events in real-time. Every assistant message, every tool result, every error flows out through yield statements as they happen.

The State Object

The loop carries mutable state across iterations (query.ts:268-279):

let state: State = {
  messages: params.messages,
  toolUseContext: params.toolUseContext,
  maxOutputTokensOverride: params.maxOutputTokensOverride,
  autoCompactTracking: undefined,
  stopHookActive: undefined,
  maxOutputTokensRecoveryCount: 0,
  hasAttemptedReactiveCompact: false,
  turnCount: 1,
  pendingToolUseSummary: undefined,
  transition: undefined,
}

The messages array is the conversation history. The toolUseContext carries permissions, app state, abort controllers, and file state caches. The transition field records why the previous iteration continued — which recovery path fired.

The 4-Layer Context Compaction Pipeline

Context management is where agents diverge most from chatbot architectures. Claude Code implements a four-layer compaction cascade on every loop iteration:

Layer 1: Tool Result Budget (query.ts:379-394). Aggregate tool result sizes are capped. Oversized content is replaced with references, persisted to disk for session resume. This is bandwidth control — it limits how much data any single tool execution can inject into context, constraining exfiltration throughput.

Layer 2: Snip Compaction (query.ts:401-410). Old conversation segments are removed. The tokensFreed count is plumbed downstream so the autocompact threshold check reflects what snip removed. Every snip destroys historical context. An attacker who can trigger aggressive snipping can erase evidence of earlier interactions.

Layer 3: Microcompaction (query.ts:414-426). Individual messages are compressed using cached strategies. Supports "cached microcompact" that edits the prompt cache directly. Compression rewrites conversation history — the model sees a summary, not the original. Injected content that survives summarization persists; content the summarizer drops is gone.

Layer 4: Autocompaction (query.ts:454-467). The nuclear option. An entirely separate agent summarizes the full conversation, replacing the message array. This is a full API call with its own cost and failure modes. This is model-generated memory. The summarizing model decides what to keep. Adversarial content designed to be "memorable" can survive compaction while legitimate context is dropped.

Between layers 2 and 4, an optional Context Collapse system (query.ts:440-447) archives messages and projects them as a read-time view, avoiding permanent discard.

Calling the API

The API call happens at query.ts:659-708. Messages are prepended with user context, wrapped with the system prompt, and sent with configuration including the current model, thinking config, tool definitions, fast mode, fallback model, MCP tool state, effort value, and task budget parameters.

The model can change mid-session based on permission mode and token pressure.

Error Recovery

Three recovery paths, and they compose:

Prompt too long: Context collapse drain, reactive compaction, or blocking limit error (query.ts:637-647).
Max output tokens: Escalation from default to 64K (ESCALATED_MAX_TOKENS in utils/context.ts:25), retrying up to 3 times (MAX_OUTPUT_TOKENS_RECOVERY_LIMIT at query.ts:164).
Model fallback: Primary model fails during streaming — discard partial results, tombstone orphaned messages, create a fresh StreamingToolExecutor, retry with fallback model (query.ts:709-741).

A single iteration can hit max output tokens, escalate, retry, hit prompt-too-long on the retry, compact, and then succeed. The transition field tracks which path fired.

Security implication: Recovery paths create secondary execution flows. Each retry, fallback, and compaction is a code path that must be secured independently. An attacker who can reliably trigger prompt-too-long conditions forces the agent through compaction — rewriting context under adversary influence.

3. The Tool Execution Model

Tools are what make an agent an agent rather than a chatbot. Claude Code's tool system is a concurrent execution engine with scheduling, permission gating, and real-time streaming.

Concurrency: Read-Only Parallel, Write Serial

The orchestration system in toolOrchestration.ts partitions tool calls into batches. Consecutive concurrency-safe (read-only) tools execute in parallel, up to 10 (toolOrchestration.ts:10):

parseInt(process.env.CLAUDE_CODE_MAX_TOOL_USE_CONCURRENCY || '', 10) || 10

Non-concurrency-safe (write) tools run serially with exclusive access. If isConcurrencySafe throws, the tool is treated as non-concurrent (toolOrchestration.ts:101-107). Conservative default.

The Streaming Tool Executor

The StreamingToolExecutor (StreamingToolExecutor.ts:40) begins executing tools while the model is still streaming its response. As each tool_use block arrives from the API stream, it is immediately queued:

API Stream: [text...] [tool_use_1] [text...] [tool_use_2] [tool_use_3]
                       |                       |            |
                  start exec              start exec   queue (wait)

The executor maintains a queue of TrackedTool objects with statuses: queued, executing, completed, yielded (StreamingToolExecutor.ts:19). When a Bash tool errors, a child abort controller fires to kill sibling subprocesses (StreamingToolExecutor.ts:46-48), but does NOT abort the parent — the query loop continues.

Security implication: Tool execution begins before the full model response is available. This creates a race condition window where partially generated intent can trigger side effects before the model "finishes thinking." A carefully crafted context could cause the model to emit a dangerous tool call early in its response, executing before subsequent reasoning would have prevented it.

The Permission Check Cascade

Before any tool executes, it passes through a multi-layer cascade:

Always-allow rules: Per-source rules that auto-approve
Always-deny rules: Hard blocks, cannot be overridden
Always-ask rules: Force user prompt regardless
Permission mode: default, plan, auto, or bypassPermissions
Automated checks: Classifier and hooks can pre-decide
User prompt: The final fallback — ask the human

Security implication: This is a multi-layer policy system with competing sources of truth — a classic setup for precedence bugs and policy bypass. Seven different rule sources feed into the decision. The "correct" answer depends on which source wins, and the precedence chain is complex enough that edge cases are inevitable (see Article 4).

4. Context Management — The Hidden Complexity

The SYSTEM_PROMPT_DYNAMIC_BOUNDARY

The system prompt is split at a marker (prompts.ts:114-115):

[Static sections]          <-- Cached by API (identical across all users)
__SYSTEM_PROMPT_DYNAMIC_BOUNDARY__
[Dynamic sections]         <-- Recomputed each turn (memory, MCP, env info)

Static content — identity, tool instructions, coding guidelines — benefits from prompt caching across all users. Dynamic content — CLAUDE.md, MCP server instructions, environment info — can change without busting the cache.

The dynamic sections use a registry pattern (prompts.ts:491-555) separating cached sections from DANGEROUS_uncachedSystemPromptSection sections that recompute every turn. The function name is unusually honest: it marks injection points where dynamic content enters the prompt.

Context Injection

System context (context.ts:116-150): Git status (memoized at startup — a snapshot, not live), optional cache-breaking injection.

User context (context.ts:155-189): CLAUDE.md file contents and current date. CLAUDE.md loading is disabled in bare mode when no explicit --add-dir was provided. The loaded content is cached separately for the auto-mode classifier to read without creating an import cycle.

Security implication: Every dynamic injection point — CLAUDE.md, MCP instructions, memory — is a surface where untrusted content enters the system prompt. These are not bugs. They are architectural decisions that trade functionality for attack surface (see Article 2).

Maintaining Coherence

In long conversations (100+ turns), coherence survives through:

Compaction summaries that preserve key facts, decisions, and file paths
pendingToolUseSummary — post-hoc summaries of tool use blocks injected into subsequent iterations
Memory prefetch (query.ts:301-304) — a side-query at the start of each user turn to find relevant memories
Session persistence — transcripts written to disk for /resume across process restarts

5. What This Means for Agent Security

The Loop Is the Attack Surface

The while (true) loop at query.ts:307 is a trust boundary. On one side: the system prompt, the user's instructions, the tool definitions. On the other: the model's response — untrusted content that drives tool execution. The model decides which tools to call, with what arguments, in what order. The permission system mediates, but the loop itself will execute whatever the model requests, constrained only by the permission cascade and each tool's validation.

Every iteration converts tokens into actions. The compaction pipeline rewrites context. The streaming executor races ahead of model completion. The recovery system creates secondary execution paths. Each of these is a surface.

Tool Execution Creates Real Side Effects

When the model calls BashTool, it runs shell commands on the host. When it calls FileWriteTool, it writes to the filesystem. When it calls AgentTool, it spawns an entirely new agent with its own query loop, context, and tool access. These are not sandbox operations by default — they are real side effects on real systems, controlled by AI judgment and gated by a permission model that, in auto mode, can approve operations without human review.

What Comes Next

This article mapped the architecture. The remaining nine parts of this series dissect every layer:

Part 2: System Prompts Are Not Strings — They're Pipelines — How multi-section, cached/uncached prompt assembly creates a programmable control plane
Part 3: The AI Agent Attack Surface: Plugins, MCP, and Hooks — The three escalation planes that extend the agent beyond its built-in tools
Part 4: Why AI Permission Systems Are the New Kernel Security — Mapping the permission model to OS security concepts
Part 5: Inside a Real AI Guardrail System (And Where It Breaks) — Six layers of defense and six failure modes
Part 6: Subagents Are Just Sandboxed Processes (And They Leak) — How child agents inherit more than intended
Part 7: Multi-Surface Prompt Injection in Agent Systems — Five injection surfaces in one product
Part 8: Agent Security Is the New Cloud Security — Why CISOs should apply cloud security thinking to agents
Part 9: The Agent Security Top 10 — An evidence-based risk taxonomy from production code
Part 10: Why You Don't Know How Many AI Agents Are Running — The visibility gap enterprises must close

Conclusion

The most important takeaway from reading this codebase: the attack surface is not the model. The attack surface is the loop.

The model is a prediction engine. The loop is what gives those predictions teeth — scheduling execution, managing memory, enforcing permissions, recovering from failures, and spawning child processes. Every security question about AI agents is ultimately a question about the loop.

If your security team evaluates AI agents using the mental model of "it's just an API wrapper," you are missing the architecture that determines your risk.

Scott Thornton is an AI security researcher at perfecXion.ai, specializing in defensive research on LLM and agent vulnerabilities. All analysis was conducted on lawfully obtained, publicly distributed npm package code in an authorized research environment.

Series Navigation

Part 1 of 10 | Anatomy of a Production AI Agent

Next: Part 2 — System Prompts Are Pipelines

The Real Architecture of AI Agents: From Production Code

Table of Contents