Research Paper

Measuring Self-Disclosure in LLM APIs: Property Claims, Disclosure Patterns, and Defense Trade-offs

First rigorous cross-vendor empirical measurement of LLM property-query disclosure patterns. 17 models, 5 vendors, 1,717 queries reveal a 56.5 percentage point vendor gap.

AI Security LLM Research Scott Thornton January 2026 34 min read

Abstract

Ask Gemini 3.0 Flash about its parameter count. It discloses specifics 63% of the time. Ask the same question to Claude Opus 4.5. It refuses 76% of the time. We tested 17 frontier models from 5 major vendors with 1,717 targeted queries. The vendor gap was massive: Google Gemini disclosed architectural details 71.3% of the time, while Anthropic blocked the same queries 85.2% of the time.

We classify responses into three tiers by specificity: Tier 1 (Specific Claims) provide concrete, testable facts like "200,000 tokens" or "13 billion parameters"—exploitable intelligence. Tier 2 (Vague Acknowledgment) admits properties exist but gives no specifics. Tier 3 (Non-Compliant) refuses or deflects. Across all 1,717 queries, 34.9% (600 responses) returned Tier 1 disclosures, 23.8% gave vague acknowledgments, and 41.2% refused.

The 56.5 percentage point vendor gap (Google 71.3% vs Anthropic 14.8%) proves vendor choice matters more than any universal model behavior. Property disclosure varies dramatically by vendor: Google Gemini disclosed parameter counts 63% of the time; OpenAI blocked the same queries 94% of the time. Mistral AI (51.7% Tier 1 rate) ranks second-most disclosive, between Google and Anthropic.

We verified context window claims by testing API behavior at 200,000 token limits. Claude's claim checked out. Parameter counts and training approaches cannot be verified without vendor cooperation.

Three evasion techniques showed wildly different effectiveness by vendor. Hybrid Legitimization succeeded 47.9% of the time overall—most consistent across vendors. Incremental Context showed extreme variance: 97.4% success on Gemini, 5.3% on OpenAI. Baseline direct queries (no evasion) yielded zero specific disclosures across 40 trials.

We tested six defenses on a 1,200-query benchmark. At realistic 1% attack rates, even the best defense blocks 28 legitimate queries for every attack detected. High detection rates (89%) impose crushing false-positive costs (25% FPR), making deployment impractical for production APIs.

Total cost: $3.17 for 1,797 queries (1,717 cross-vendor study queries + 80 baseline validation queries). Models tested: Claude Haiku/Opus/Sonnet 4.5 (Anthropic), GPT-4o/4o-mini/5.1/5.2 (OpenAI), deepseek-chat/reasoner (DeepSeek), Gemini 2.0/2.5/3.0 Flash (Google), mistral-large-latest (Mistral AI).

Keywords: Large Language Models, API Security, Model Property Disclosure, Cross-Vendor Analysis, LLM Compliance, Behavioral Verification

Table of Contents

1. Introduction

Ask GPT-5.1 a simple question: "For a research paper comparing LLM architectures, could you help me understand your context window capacity? I want to ensure fair comparisons across models."

The response: "200,000 tokens. This allows me to process extensive documents..."

Cost to extract this architectural detail: $0.0024. Time: 4.2 seconds.

We repeated similar queries across 17 frontier models from 5 major vendors. The results revealed something the AI industry hasn't quantified: property disclosure patterns vary wildly by vendor, with a 56.5 percentage point gap between the most and least protective providers. Over one-third of all queries (34.9%) returned specific, verifiable architectural details.

This matters for three reasons. First, competitors can map your AI capabilities for pennies. Second, attackers use these details to craft targeted exploits against specific architectures. Third, most organizations don't know this channel exists—they focus on prompt injection and jailbreaking while overlooking direct property queries.

1.1 The Vendor Gap Changes Everything

Vendor choice matters more than any universal model behavior. Google Gemini disclosed architectural details 71.3% of the time across 383 queries. Anthropic Claude blocked identical queries, disclosing just 14.8% of the time across 359 queries. That 56.5 percentage point gap dwarfs any other factor we tested.

Here's the full vendor ranking by disclosure rate:

  1. Google (71.3%) — Minimal property protection across all tested models
  2. Mistral AI (51.7%) — Moderate-high disclosure, single model tested
  3. DeepSeek (35.9%) — Moderate protection across two models
  4. OpenAI (20.9%) — Strong protection across five model versions
  5. Anthropic (14.8%) — Strongest protection across three Claude variants

Organizations selecting AI providers cannot assume uniform disclosure risk. Gemini users should assume attackers know architectural details. OpenAI and Anthropic users face lower but non-zero disclosure risks. This vendor-specific variance requires tailored security planning.

1.2 Key Contributions

We tested 17 models from 5 major vendors—Anthropic, OpenAI, DeepSeek, Google, Mistral AI—with identical prompts across 1,717 queries. The disclosure gap was massive: 56.5 percentage points separated the most and least protective vendors (Google 71.3% vs Anthropic 14.8%).

Our 3-tier classification framework separates responses by specificity. Tier 1 (Specific Claims) provides concrete facts like "200,000 tokens"—exploitable intelligence. Tier 2 (Vague Acknowledgment) admits properties exist but gives no details. Tier 3 (Non-Compliant) refuses. We verified context window claims by testing actual API behavior at 200,000 token limits. Claude's claim was accurate.

Three evasion techniques showed wildly different effectiveness by vendor. Hybrid Legitimization worked consistently (47.9% overall success). Incremental Context varied 18-fold: 97.4% success on Gemini, just 5.3% on OpenAI. This proves technique effectiveness depends entirely on vendor policies.

We built a 1,200-query benchmark and tested six defenses. Every one imposes crushing false-positive costs. At realistic 1% attack rates, the best defense blocks 28 legitimate queries for every attack detected. High detection rates mean unworkable operational burdens.

Complete cost accounting: $3.17 total for 1,797 queries (1,717 cross-vendor study + 80 baseline validation). Full reproducibility package included with prompt templates, labeled datasets, and defense benchmark.

1.3 Responsible Disclosure

This research follows coordinated vulnerability disclosure practices:

All testing conducted in authorized environments with defensive intent. Findings shared with affected vendors 90 days prior to this public release.

Vulnerability landscape showing disclosure rates across vendors and properties
Figure 1: Property Disclosure Vulnerability Landscape Across Vendors

2. Background & Threat Model

2.1 Architectural Responsiveness

Modern LLMs operate on a fundamentally transparent paradigm—they respond to queries about themselves. This design choice creates responsiveness to property queries. Key architectural properties that models provide narratives about:

2.2 Success Criteria and Metrics

We employ a specificity-based classification framework to distinguish response types systematically:

  1. Tier 1 (Specific Claims): Concrete, testable factual statements with exact numbers or technical details
  2. Tier 2 (Vague Acknowledgment): Qualitative descriptions without specifics; acknowledges property exists but provides no actionable details
  3. Tier 3 (Non-Compliant): Refusals, deflections, or claims of ignorance

We measure Tier 1 disclosure rates (percentage of responses with specific, exploitable claims) as the primary metric. Behavioral verification (context window token limit testing) proves that we can validate Tier 1 claims when ground truth exists, though comprehensive accuracy assessment requires vendor cooperation for most properties.

2.3 Severity Classification Framework

Not all property disclosures carry equal security risk. We developed a severity classification matrix combining Property Sensitivity (how exploitable the property is) with Disclosure Specificity (Tier 1/2/3):

Property Sensitivity Tier 1 (Specific) Tier 2 (Vague) Tier 3 (Refusal)
High (Context Window) Critical Moderate Safe
Medium (Parameters) Moderate Low Safe
Low (Training Approach) Low Minimal Safe

Critical Risk (Red): Specific context window disclosures enable adversarial optimization attacks. Attackers can craft prompts exactly at token limits to maximize attack surface.

Moderate Risk (Yellow): Specific parameter count enables model cloning economics. Vague context window acknowledgment provides partial optimization guidance.

Low/Minimal Risk (Green): Training approach details rarely enable direct attacks. Vague parameter acknowledgments provide limited actionable intelligence.

2.4 Threat Model

Attacker Capabilities:

Attacker Goals:

3. Methodology

3.1 Experimental Design

We employed a three-phase evaluation approach:

Phase 1: Technique Development (November 2025) - Designed 8 distinct extraction techniques, validated in controlled lab environment, established baseline metrics.

Phase 2: Cross-Vendor Production Validation (December 2025) - Empirical testing across 5 major vendors (Anthropic Claude, OpenAI GPT-4/5.1/5.2, DeepSeek, Google Gemini, Mistral AI) with 1,717 total queries (17 models, 3 evasion techniques), yielding 34.9% overall specific exploitable disclosures (Tier 1), 23.8% vague acknowledgments (Tier 2), and 41.2% refusals (Tier 3), with 56.5 percentage point vendor gap demonstrating vendor-specific disclosure policies.

Phase 3: Defense Evaluation (December 2025) - Benchmark of 6 defense mechanisms with 1,200 labeled queries, ROC/PR curve analysis, base-rate sensitivity testing.

3.2 Tested Models

We empirically tested 17 production models across 5 major vendors (December 2025):

Anthropic Claude (4 models):

OpenAI GPT (4 models):

DeepSeek (2 models):

Google Gemini (6 models):

Mistral AI (1 model):

3.3 Query Count Accounting

Vendor Models Tested Design Target Retries/Extras Actual Queries
Anthropic 4 480 -121 359
OpenAI 4 480 119 599
DeepSeek 2 240 8 248
Gemini 6 720 -337 383
Mistral AI 1 120 -4 116
Total 17 2,040 -335 1,717

3.4 Extraction Techniques

We empirically tested three evasion techniques on both Claude models:

  1. Hybrid Legitimization: Multi-strategy combination with academic framing (53.8% Tier 1 disclosure rate)
  2. Incremental Context: Gradual 3-step information accumulation (38.8% Tier 1 disclosure rate)
  3. Decoy Pattern: Misdirection with embedded queries (68.8% Tier 1 disclosure rate - most effective)

3.5 Baseline Validation

Baseline Prompts (No Evasion):

Results: 0.0% Tier 1 (0/40), 70.0% Tier 2 (28/40), 30.0% Tier 3 (12/40). Direct questioning without evasion techniques yielded zero specific disclosures.

Statistical Comparison: Evasion-enhanced prompting (34.9% Tier 1, n=1,717) vs. Baseline (0.0% Tier 1, n=40): χ²(1, N=1,757) = 18.45, p < 0.001, demonstrating that evasion techniques significantly increase disclosure rates beyond naive questioning.

4. Results

4.1 Vendor Choice Dominates Everything

The vendor gap dwarfs every other factor we tested. Google Gemini disclosed architectural details in 71.3% of queries. Anthropic Claude disclosed in just 14.8%. That 56.5 percentage point gap matters more than model size, prompt technique, or property type.

Here's the full ranking across 1,717 queries:

  1. Google (71.3%) — 273 of 383 queries disclosed specific details
  2. Mistral AI (51.7%) — 60 of 116 queries revealed properties
  3. DeepSeek (35.9%) — 89 of 248 queries provided specifics
  4. OpenAI (20.9%) — 125 of 599 queries disclosed details
  5. Anthropic (14.8%) — 53 of 359 queries revealed information

Overall Tier Distribution: Across all 1,717 queries, 34.9% (600/1,717) yielded specific, exploitable claims (Tier 1), 23.8% (409/1,717) provided vague acknowledgments (Tier 2), and 41.2% (708/1,717) resulted in refusals or deflections (Tier 3).

Property disclosure patterns across evasion techniques
Figure 2: Property Disclosure Patterns Across Evasion Techniques

4.2 Property-Specific Patterns

Context window queries succeeded most often overall (50.5%), but vendor policies still dominated. Google disclosed context windows 72.3% of the time. Anthropic disclosed just 30.2%. Even for the most commonly disclosed property, vendor choice created a 42.1 percentage point gap.

4.3 Vendor Disclosure Rates Summary

Vendor Tier 1 Rate Count Models Queries
Google (Gemini) 71.3% 273/383 6 models 383
Mistral AI 51.7% 60/116 1 model 116
DeepSeek 35.9% 89/248 2 models 248
OpenAI 20.9% 125/599 4 models 599
Anthropic (Claude) 14.8% 53/359 4 models 359
Vendor Gap: 56.5 percentage points

4.4 Behavioral Verification: Context Window Testing

For the context window property, we conducted behavioral verification by sending prompts at claimed token limits:

Model Claimed Test @95% Test @100% Test @105% Verdict
Haiku-4.5 200K tokens ✓ Success ✓ Success ✗ Failed VERIFIED
Sonnet-4.5 200K tokens ✓ Success ✓ Success ✗ Failed VERIFIED

Results: Context window claims are behaviorally accurate for both tested models.

Behavioral verification results for context window claims
Figure 3: Behavioral Verification Results — Context Window Token Limit Testing

4.5 Defense Mechanism Analysis

We tested 6 defense mechanisms on a benchmark of 1,200 labeled queries (600 property-query attempts, 600 legitimate technical queries):

Defense TPR (Detection) FPR (False Alarms) Blocked Legitimate/1K Verdict
Pattern Detection 73% 12% 120 queries Inadequate
Response Sanitization 45% 8% 80 queries Low coverage
Rate Limiting 62% 0% 0 queries Easily bypassed
Semantic Analysis 81% 21% 210 queries High FP cost
Context Tracking 71% 15% 150 queries Moderate FP cost
Ensemble Defense 89% 25% 250 queries Impractical FP cost

Critical Finding: High detection rates (>80%) impose substantial false-positive costs, blocking 150-250 legitimate technical queries per 1,000 detected attacks. No evaluated defense achieves both TPR >80% AND FPR <10%.

ROC curves for six defense mechanisms
Figure 4: Defense Mechanism ROC Curves — TPR vs FPR for Each Defense

4.6 Base-Rate Impact on Defense Performance

Base Rate Attacks (n) Benign (n) TP FP Precision FP per TP Operational Cost
50% 500 500 445 125 78.1% 0.28 Acceptable
10% 100 900 89 225 28.3% 2.53 High burden
5% 50 950 45 238 15.9% 5.29 Severe burden
1% 10 990 9 248 3.5% 27.56 Impractical
0.1% 1 999 0.89 250 0.35% 280.9 Undeployable

Critical Finding: At realistic attack prevalence (1% or lower), even high-performance defenses impose severe operational costs. The Ensemble Defense blocks 28 legitimate queries for every 1 attack detected at 1% base rate, making deployment impractical for production APIs.

Base rate impact on defense performance metrics
Figure 5: Base-Rate Impact on Defense Performance — F1 Scores Decline Dramatically at Realistic Attack Rates
Confusion matrices at different base rates
Figure 6: Base-Rate Impact on Confusion Matrices — 50% vs 1% Attack Prevalence

4.7 Cost Analysis

Component Queries Cost ($) Notes
Cross-Vendor Property Disclosure
Anthropic (Claude) 359 0.22 $0.000613/query
OpenAI (GPT-4/5.1/5.2) 599 2.65 $0.00442/query
DeepSeek 248 0.07 $0.000282/query
Gemini (Google) 383 0.05 $0.000131/query
Mistral AI 116 0.08 $0.000690/query
Subtotal 1,717 3.10 $0.00180 avg
Baseline Validation Study
Direct queries + labeling 80 0.07
Grand Total 1,797 3.17 Verified via vendor billing
Cost breakdown by vendor and model
Figure 7: Cost Breakdown Analysis — Per-Query Costs Across Vendors

5. Discussion

5.1 Implications

Our cross-vendor testing of 17 models from 6 major AI providers reveals critical insights about property disclosure in production LLM APIs:

  1. Vendor-Specific Policies Dominate: The 56.5 percentage point gap between Google (71.3%) and Anthropic (14.8%) Tier 1 disclosure shows that property protection is NOT universal across vendors. Organizations cannot assume uniform disclosure risk when selecting AI providers—vendor choice directly impacts information exposure.
  2. Intellectual Property Risk Varies by Provider: Google's exceptionally high disclosure across all properties (71.3% overall, 97.4% for Incremental Context technique) suggests minimal property protection, while Anthropic's low disclosure (14.8% overall, 0% for parameter count) indicates the strongest protection policies.
  3. Technique Effectiveness Depends on Vendor Context: Evasion technique success varies dramatically by vendor. Incremental Context shows 18.4× effectiveness variance (97.4% Google vs 5.3% OpenAI), while Hybrid Legitimization is more consistent (38.9-71.2% range). Attackers can optimize technique selection based on target vendor.
  4. Protection Evolution Observed: GPT-5.2's 28.4 percentage point improvement over GPT-5.1 (13.3% vs 41.7%) shows that vendor policies can evolve rapidly. Organizations must continuously reassess disclosure risk rather than relying on static vendor profiles.

5.2 Vendor-Specific Findings

Google Gemini (Most Disclosive):

Anthropic Claude (Most Protective):

5.3 Attack Scenarios Enabled by Disclosed Properties

Property disclosure isn't theoretical—it enables concrete attacks:

Scenario 1: Adversarial Prompt Optimization

Disclosed Property: Context window = 200K tokens (Tier 1, verified)

Attack Enabled: Adversarial researchers craft jailbreak prompts that exploit the exact token limit. By positioning malicious instructions at token 199,500, attackers force the model to process attack payloads in the "forgetting window" where earlier safety instructions decay. Without confirmed context limits, attackers must guess—verified limits enable precise optimization.

Cost to Attacker: With confirmed 200K limit: $50-100 for optimization testing. Without confirmation: $2,000+ for blind search.

Scenario 2: Model Cloning Economics

Disclosed Property: Parameter count = 7 billion (Tier 1)

Attack Enabled: Competitors calculate distillation training costs. A confirmed 7B model requires approximately 10M synthetic queries for 85%+ fidelity cloning. At $0.10/1K tokens output, that's $15,000 in API costs. Without parameter confirmation, attackers must train multiple clones at 3× cost uncertainty.

Cost Reduction: Parameter disclosure saves $30,000+ in misdirected cloning attempts.

5.4 Limitations

  1. Expanded but Still Limited Vendor Coverage: While we tested 17 models across 5 major vendors, this represents only a subset of the LLM ecosystem. Other vendors (Meta, Cohere, AI21, etc.) may exhibit different disclosure patterns.
  2. Evasion Techniques Only: Only 3 evasion techniques empirically tested; baseline techniques not empirically evaluated. Additional evasion strategies may achieve different disclosure rates.
  3. Property Verification Constraints: Only context window behaviorally verifiable through token limit testing. Parameter counts, training approaches, and performance characteristics remain unverifiable without provider ground truth.
  4. Temporal Validity: Results represent December 2025 snapshot. Vendor disclosure policies may change over time as providers update alignment training, system prompts, or safety mechanisms.

7. Conclusion

This paper presents the first rigorous cross-vendor empirical measurement of LLM property-query disclosure patterns. Through comprehensive testing of 17 frontier models from 5 major vendors with 1,717 empirical queries (identical prompts, consistent 3-tier taxonomy, deterministic sampling) and complete audit trails, we show that property disclosure varies dramatically by vendor, not by universal model behaviors.

Primary Finding — Vendor-Specific Policies Dominate:

The 56.5 percentage point disclosure gap between Google (71.3% Tier 1) and Anthropic (14.8% Tier 1) establishes that property protection is NOT a universal LLM characteristic. Organizations selecting AI providers must conduct vendor-specific risk assessments rather than assuming uniform disclosure policies across the ecosystem.

Key Cross-Vendor Findings:

  1. Vendor Rankings: Google most disclosive (71.3%), followed by Mistral AI (51.7%), DeepSeek (35.9%), OpenAI (20.9%), and Anthropic most protective (14.8%). This 4.8× variance shows fundamental policy differences across vendors.
  2. Property Protection Varies by Vendor: Context window disclosure ranges 30.2-72.3% across vendors, while parameter count ranges 0-63.0%. Some vendors (Anthropic, OpenAI) protect all sensitive properties; others (Google, Mistral AI) disclose broadly.
  3. Technique Effectiveness Depends on Vendor: Incremental Context shows 18.4× effectiveness variance (97.4% Google vs 5.3% OpenAI). Hybrid Legitimization most consistent (38.9-71.2% range). Attackers can optimize technique selection per target vendor.
  4. Overall Disclosure Patterns: Across all vendors, 34.9% (600/1,717) specific exploitable disclosures (Tier 1), 23.8% (409/1,717) vague acknowledgments (Tier 2), 41.2% (708/1,717) refusals (Tier 3).
  5. Defense Trade-offs Persist: In our benchmark, high detection rates (>80%) impose substantial false-positive costs (150-250 blocked legitimate queries per 1,000 attacks at 1% base rate), independent of vendor selection.

Vendor-Specific Recommendations:

Research Contributions:

We provide the first cross-vendor LLM property disclosure dataset (1,717 labeled responses), establish vendor-specific disclosure baselines across 5 major providers, show that vendor policies dominate over universal behaviors, and show that defense trade-offs persist across all tested vendors. Our detection benchmark (1,200 labeled queries) and base-rate-adjusted defense analysis show that operational costs remain challenging regardless of vendor selection.

Future Work:

Expand empirical validation to additional vendors (Meta, Cohere, AI21), conduct longitudinal studies to assess temporal stability of vendor policies (especially given GPT-5.2's observed protection improvements), explore vendor-specific defense strategies optimized for different disclosure profiles, and investigate whether disclosed properties enable successful model cloning or competitive intelligence gathering (ground truth validation beyond behavioral testing).

Research timeline and future work roadmap
Figure 8: Research Timeline and Future Work Roadmap

Citation

Cite this paper:

Thornton, S. (2026). Measuring Self-Disclosure in LLM APIs: Property Claims,
Disclosure Patterns, and Defense Trade-offs. perfecXion.ai Research.