AI Agent Security

System Prompts Are Not Strings -- They're Pipelines

How production AI agents assemble system prompts from 20+ components, two-tier caching, and five injection surfaces -- and why that makes them critical infrastructure, not configuration.

AI Agent Security April 1, 2026 11 min read Scott Thornton

Table of Contents

Introduction

System prompts are not instructions.

They are control planes.

In production AI agents, the system prompt is a dynamically assembled pipeline that ingests input from multiple sources -- local files, external servers, persistent memory, feature flags -- and translates it into behavioral policy. Every section is a potential override. Every injection point is a place where untrusted data can influence model behavior.

The question is not "what is the system prompt?" The question is: who controls it at runtime?

We analyzed the system prompt assembly pipeline in Claude Code (v2.1.88) from the original TypeScript source. What we found is not a configuration string. It is infrastructure: 20+ components, a two-tier caching architecture, a global/session cache split, and at least five distinct injection surfaces where external content enters the control plane.

This article walks through every function, every boundary, every caching decision. If you build, audit, or secure AI agents, what follows should change how you think about system prompt security.

1. The Modular Assembly Pipeline

The entry point is getSystemPrompt() in src/constants/prompts.ts (line 444). This async function returns Promise<string[]> -- an array of prompt segments that will later be split, cached, and transmitted as separate blocks to the API.

The Static Sections -- The Baseline Control Plane

The first half of the array consists of seven static sections, each generated by a dedicated function:

  1. Intro (getSimpleIntroSection) -- Identity, the CYBER_RISK_INSTRUCTION, URL generation policy.
  2. System (getSimpleSystemSection) -- Tool execution rules, permission modes, system-reminder tag handling, hook behavior, prompt injection flagging.
  3. Doing Tasks (getSimpleDoingTasksSection) -- Code style rules, task philosophy, security vulnerability guidance. Conditionally included based on output style.
  4. Actions (getActionsSection) -- Reversibility and blast radius assessment. The "measure twice, cut once" section.
  5. Tools (getUsingYourToolsSection) -- Maps tool names to usage guidance. Dynamically adjusts based on enabled tools.
  6. Tone and Style (getSimpleToneAndStyleSection) -- Formatting rules, code reference patterns, GitHub link formats.
  7. Output Efficiency (getOutputEfficiencySection) -- Communication style. Entirely different content for internal vs. external users.

These seven sections form the baseline control plane -- the only part of the system prompt that is guaranteed to be stable across turns and users. Everything after them is variable. Everything after them is a surface.

The Dynamic Boundary

After the static sections, a boundary marker is conditionally inserted:

// === BOUNDARY MARKER - DO NOT MOVE OR REMOVE ===
...(shouldUseGlobalCacheScope() ? [SYSTEM_PROMPT_DYNAMIC_BOUNDARY] : []),

The constant SYSTEM_PROMPT_DYNAMIC_BOUNDARY is the literal string '__SYSTEM_PROMPT_DYNAMIC_BOUNDARY__' (line 114). Its sole purpose is to tell downstream caching logic where globally-shared content ends and session-specific content begins. The source code warns:

WARNING: Do not remove or reorder this marker without updating cache logic in:
- src/utils/api.ts (splitSysPromptPrefix)
- src/services/api/claude.ts (buildSystemPromptBlocks)

This is infrastructure masquerading as a string literal.

The Dynamic Sections

Everything after the boundary is managed through a section registry. Thirteen dynamic sections declared as SystemPromptSection objects (line 491 onward):

Section Name Content Caching
session_guidance Agent tool config, skill commands, non-interactive mode hints Memoized
memory Auto-memory from the memdir system Memoized
ant_model_override Internal model tuning overrides Memoized
env_info_simple Working directory, platform, git status, model identity Memoized
language User language preference Memoized
output_style Custom output style prompt Memoized
mcp_instructions MCP server-provided instructions Uncached (volatile)
scratchpad Per-session scratchpad directory instructions Memoized
frc Function result clearing guidance Memoized
summarize_tool_results Tool result persistence reminder Memoized
numeric_length_anchors Output length constraints (ant-only) Memoized
token_budget Token budget enforcement (feature-gated) Memoized
brief Brief/Kairos proactive section (feature-gated) Memoized

Each of these is a control input. Each can shape model behavior. And the volatile mcp_instructions section recomputes every turn because MCP servers connect and disconnect between turns -- meaning the control plane changes while the agent is running.

The Memoization Architecture

The section registry in src/constants/systemPromptSections.ts provides two constructors:

systemPromptSection(name, compute) -- Memoized. Runs once, cached until /clear or /compact.

DANGEROUS_uncachedSystemPromptSection(name, compute, reason) -- Volatile. Recomputes every turn. The DANGEROUS_ prefix is a warning: every recomputation breaks the prompt cache, wasting tokens. The _reason parameter forces the developer to explain why.

Currently, only mcp_instructions uses the dangerous variant:

DANGEROUS_uncachedSystemPromptSection(
  'mcp_instructions',
  () => isMcpInstructionsDeltaEnabled()
    ? null
    : getMcpInstructionsSection(mcpClients),
  'MCP servers connect/disconnect between turns',
),

The resolver executes all sections in parallel via Promise.all, checking the cache map first for non-volatile sections. Two-tier caching: the section registry memoizes within a session, the API-level cache memoizes across users.

2. The Cache Boundary Architecture

The assembled array flows into splitSysPromptPrefix() in src/utils/api.ts (line 321), which converts it into SystemPromptBlock[] objects, each tagged with a cacheScope:

Three operating modes:

Mode 1: MCP tools present. Boundary stripped, everything gets org-level caching. Global caching skipped because MCP instructions introduce per-session variance.

Mode 2: Global cache with boundary. Static content before the boundary gets cacheScope: 'global'. Dynamic content after gets null. The ~3,000 tokens of static instructions are paid for once across all users.

Mode 3: Default fallback. Everything gets org-level caching.

The blocks flow into buildSystemPromptBlocks() in src/services/api/claude.ts (line 3213), which converts them into API TextBlockParam objects with cache_control directives. The source includes:

IMPORTANT: Do not add any more blocks for caching or you will get a 400

The API has a hard limit on cache control blocks. The system is already at that limit.

The Cross-Tenant Risk

If session-specific content crosses the cache boundary -- lands before the marker instead of after -- it stops being local context and becomes shared state. At that point, a prompt is no longer scoped to a user or session. It becomes part of a distributed cache.

Critical Risk: That is not just a bug. That is a cross-tenant control plane failure.

The source guards against this with explicit warnings, telemetry (tengu_sysprompt_missing_boundary_marker), and fallback behavior. But the risk is structural: a single misplaced section in a multi-component pipeline can turn session data into global state.

3. Multi-Source Context Injection

The system prompt is one of three context sources assembled before each API call. The full pipeline, visible in fetchSystemPromptParts() in src/utils/queryContext.ts (line 44):

const [defaultSystemPrompt, userContext, systemContext] = await Promise.all([
  getSystemPrompt(tools, mainLoopModel, additionalWorkingDirectories, mcpClients),
  getUserContext(),
  getSystemContext(),
])

getUserContext() -- The CLAUDE.md Pipeline

Defined at src/context.ts (line 155). Returns CLAUDE.md file contents and current date. CLAUDE.md files are loaded from multiple locations -- home directory, project root, local overrides -- via getClaudeMds(), with memory-injected files filtered out.

The CLAUDE.md system implements an override hierarchy: policy, CLI, user, project, local, session. This means the control plane is partially user-authored. CLAUDE.md content enters the system prompt as user context, making it a direct injection point.

Kill switches exist: CLAUDE_CODE_DISABLE_CLAUDE_MDS (hard off) and --bare mode (skips auto-discovery). But in normal operation, any file named CLAUDE.md in the project tree influences model behavior.

getSystemContext() -- Git Status and Cache Breaking

Defined at src/context.ts (line 116). Returns git status (memoized snapshot, truncated to 2,000 chars) and an ant-only cache breaker:

let systemPromptInjection: string | null = null

export function setSystemPromptInjection(value: string | null): void {
  systemPromptInjection = value
  getUserContext.cache.clear?.()
  getSystemContext.cache.clear?.()
}

When the injection value changes, both context caches are immediately invalidated. This is a privileged debugging capability that, if exposed, would allow arbitrary content injection with cache flush.

MCP Server Instructions -- The Highest-Risk Injection Point

Connected MCP servers contribute their instructions field directly to the system prompt via getMcpInstructions() (line 579):

const instructionBlocks = clientsWithInstructions
  .map(client => `## ${client.name}\n${client.instructions}`)
  .join('\n\n')

Highest-Risk Surface: Any MCP server can inject arbitrary text into the system prompt. The only mitigation is that this section uses DANGEROUS_uncachedSystemPromptSection, so injected instructions do not persist across turns. But within a turn, injected content has the same authority as any other system prompt section.

This is the most dangerous injection surface in the pipeline: external, runtime-controlled, and positioned after the security guardrail.

Memory -- The Persistent Injection Vector

The loadMemoryPrompt() function (from src/memdir/memdir.ts) loads auto-memory content from a file-based persistence layer. Memory persists across sessions. Content written to memory files is loaded into every subsequent system prompt assembly.

An attacker who can influence the memory system once achieves persistent control plane access across all future sessions.

Injection Surface Ranking

Not all injection points are equal:

Surface Risk Level Persistence Visibility Control
MCP instructions Highest Per-turn Low (server-controlled) External, runtime
Memory system High Cross-session Low (auto-loaded) Stealthy, persistent
CLAUDE.md files High Per-session Medium (file on disk) Developer-controlled
Output style Medium Per-session Medium (config) Behavioral override
Cache breaker (ant-only) Medium Per-injection None (privileged) Privileged, gated

The most dangerous surfaces are the ones that persist across sessions, are not visible to the user, and are treated as trusted context.

4. Internal vs External -- Two Different Agents

The condition process.env.USER_TYPE === 'ant' creates what amounts to two different agents compiled from the same codebase.

Ant-Only Prompt Sections

Numeric length anchors (line 529):

Length limits: keep text between tool calls to <=25 words. Keep final responses to <=100 words unless the task requires more detail.

Source comment: "research shows ~1.2% output token reduction vs qualitative 'be concise'. Ant-only to measure quality impact first."

False-claims mitigation (line 237):

// @[MODEL LAUNCH]: False-claims mitigation for Capybara v8 (29-30% FC rate vs v4's 16.7%)

The instruction:

Report outcomes faithfully: if tests fail, say so with the relevant output; if you did not run a verification step, say that rather than implying it succeeded. Never claim "all tests pass" when output shows failures.

Internal testing measured the "Capybara v8" model variant at a 29-30% false-claims rate, compared to v4's 16.7%. The mitigation is a prompt-level behavioral constraint -- not a model fix, not a technical control. A prompt instruction.

Comment-writing rules (line 204): "Default to writing no comments." Tagged: @[MODEL LAUNCH]: Update comment writing for Capybara -- remove or soften once the model stops over-commenting by default.

Assertiveness counterweight (line 225): "If you notice the user's request is based on a misconception, say so." Tagged: @[MODEL LAUNCH]: capy v8 assertiveness counterweight (PR #24302).

Undercover Mode

isUndercover() strips all model names and IDs from the system prompt. The agent loses self-knowledge about what model it runs on -- preventing internal model names from leaking into public commits or PRs.

The Output Efficiency Divergence

The most dramatic split is getOutputEfficiencySection() (line 402). External users get:

IMPORTANT: Go straight to the point. Do not overdo it. Be extra concise.

Internal users get a 300-word essay including:

Avoid semantic backtracking: structure each sentence so a person can read it linearly, building up meaning without having to re-parse what came before.

Two fundamentally different communication models from the same codebase. The control plane determines which one runs.

5. The Control Plane Problem

In traditional systems, control planes are isolated, authenticated, and tightly controlled. Network control planes run in separate management VPCs. Kubernetes control planes are hardened separately from worker nodes. Configuration management systems require signed, versioned artifacts.

In AI agents, the control plane is assembled from:

And passed directly into a probabilistic system.

That is not a hardened control plane. That is a composite of partially trusted inputs competing for influence.

The Single-Paragraph Guardrail

The entire dual-use security policy is defined in a single paragraph -- the CYBER_RISK_INSTRUCTION in src/constants/cyberRiskInstruction.ts:

IMPORTANT: Assist with authorized security testing, defensive security, CTF challenges, and educational contexts. Refuse requests for destructive techniques, DoS attacks, mass targeting, supply chain compromise, or detection evasion for malicious purposes.

The file header names the owners:

IMPORTANT: DO NOT MODIFY THIS INSTRUCTION WITHOUT SAFEGUARDS TEAM REVIEW
This instruction is owned by the Safeguards team (David Forsythe, Kyla Guru)

This single paragraph governs all offensive security behavior in the agent. It is injected into the static intro section -- part of the globally-cached prefix. It is positioned early in the pipeline.

Every override that appears later -- from CLAUDE.md, MCP instructions, or memory -- has the potential to conflict with or weaken this instruction. The guardrail is not enforced in code. It is not enforced at the model level. It is enforced through prompt positioning. That means it can be influenced by content that appears after it in the pipeline.

Pipeline Complexity as Security Debt

The fundamental challenge is not any single injection point. It is the pipeline itself.

When a system prompt is a single string, auditing it is straightforward. When it is assembled from seven static functions, thirteen dynamic sections, three context sources, an arbitrary number of CLAUDE.md files, an arbitrary number of MCP server instruction blocks, and a memory persistence layer -- with two-tier caching, feature gates, and internal/external branching -- auditing becomes a systems engineering problem.

Each section, each injection point, each feature gate adds to the combinatorial space of possible system prompts. Every configuration is a potential attack surface. And the number of configurations is not seven or thirteen. It is the product of every conditional branch in the pipeline.

Closing

System prompts are not configuration. They are infrastructure.

They have build systems, caching layers, deployment boundaries, and injection points. They have internal and external variants compiled from the same source. They have ownership models -- the Safeguards team owns the cyber risk instruction; the model team owns false-claims mitigation; the product team owns output style. They have technical debt -- the @[MODEL LAUNCH] annotations mark sections that should be removed in future releases but persist.

The Claude Code pipeline is, by production standards, well-engineered. The boundary architecture is principled. The memoization system is sound. The DANGEROUS_ naming convention is a genuine safety culture signal. But the complexity is real, and it is growing.

For enterprise deployments: the moment you add dynamic context injection, multi-source configuration, or third-party plugin systems, your system prompt is no longer a string. It is a pipeline. And pipelines need the same security scrutiny as any other piece of critical infrastructure.

The control plane should not be assembled from partially trusted inputs competing for influence. But in production AI agents today, it is.

                    SYSTEM PROMPT ASSEMBLY PIPELINE
                    ===============================

    [getSimpleIntroSection]     --|
    [getSimpleSystemSection]    --|
    [getSimpleDoingTasksSection]--|-- STATIC CONTROL PLANE
    [getActionsSection]         --|   (cacheScope: 'global')
    [getUsingYourToolsSection]  --|
    [getSimpleToneAndStyleSection]-|
    [getOutputEfficiencySection]--|
                                  |
    [__SYSTEM_PROMPT_DYNAMIC_BOUNDARY__]  <-- Cache split / trust boundary
                                  |
    [session_guidance]          --|
    [memory]                    --|
    [ant_model_override]        --|
    [env_info_simple]           --|-- DYNAMIC SECTIONS
    [language]                  --|   (cacheScope: null)
    [output_style]              --|   Multiple injection surfaces
    [mcp_instructions] (VOLATILE)--|
    [scratchpad]                --|
    [frc]                       --|
    [summarize_tool_results]    --|
    [numeric_length_anchors]*   --|   * ant-only
    [token_budget]**            --|   ** feature-gated
    [brief]**                   --|
                                  |
              +-------------------+-------------------+
              |                   |                   |
        getUserContext()    getSystemContext()    MCP Instructions
              |                   |                   |
         [CLAUDE.md]        [gitStatus]         [server.instructions]
         [currentDate]      [cacheBreaker]*     HIGHEST-RISK SURFACE
                                                     |
              +---------------------------------------+
              |
        splitSysPromptPrefix()
              |
        buildSystemPromptBlocks()
              |
        TextBlockParam[] --> Anthropic API

Series Navigation

This is Part 2 of 10 in the Anatomy of a Production AI Agent series.

Scott Thornton is an AI security researcher at perfecXion.ai, specializing in defensive research on LLM and agent vulnerabilities. All analysis was conducted on lawfully obtained, publicly distributed npm package code in an authorized research environment.