Skip to content

Instantly share code, notes, and snippets.

@sam-saffron-jarvis
Created March 4, 2026 22:15
Show Gist options
  • Select an option

  • Save sam-saffron-jarvis/9d8e291c4e696ac7948702d6c4884448 to your computer and use it in GitHub Desktop.

Select an option

Save sam-saffron-jarvis/9d8e291c4e696ac7948702d6c4884448 to your computer and use it in GitHub Desktop.
Claude Code compaction deep dive v2.1.68 — prompts, threshold, 5 mechanisms, edge cases (deobfuscated)

Claude Code — Compaction Deep Dive

Package: @anthropic-ai/claude-code v2.1.68
Source: Deobfuscated from the 12MB minified cli.js bundle.
"Want to see the unminified source? We're hiring!" — Anthropic


Five compaction mechanisms

Claude Code has five distinct context management mechanisms:

Mechanism LLM? Trigger Scope
Auto full compact After turn, tokens ≥ threshold Full history replacement
Manual /compact User command Full or partial
Sub-agent compact Before turn in sub-agent loop Sub-agent history
Microcompact Warning threshold hit Clears old tool results only
Session memory compact Auto-compact trigger Uses stored session memory

Prompt 1: Full compact ($M4)

System prompt:

You are a helpful AI assistant tasked with summarizing conversations.

User prompt (appended after the full history):

Your task is to create a detailed summary of the conversation so far, paying close attention to
the user's explicit requests and your previous actions.
This summary should be thorough in capturing technical details, code patterns, and architectural
decisions that would be essential for continuing development work without losing context.

Before providing your final summary, wrap your analysis in <analysis> tags to organize your
thoughts and ensure you've covered all necessary points. In your analysis process:

1. Chronologically analyze each message and section of the conversation. For each section
   thoroughly identify:
   - The user's explicit requests and intents
   - Your approach to addressing the user's requests
   - Key decisions, technical concepts and code patterns
   - Specific details like:
     - file names
     - full code snippets
     - function signatures
     - file edits
   - Errors that you ran into and how you fixed them
   - Pay special attention to specific user feedback that you received, especially if the user
     told you to do something differently.
2. Double-check for technical accuracy and completeness, addressing each required element
   thoroughly.

Your summary should include the following sections:

1. Primary Request and Intent: Capture all of the user's explicit requests and intents in detail
2. Key Technical Concepts: List all important technical concepts, technologies, and frameworks
   discussed.
3. Files and Code Sections: Enumerate specific files and code sections examined, modified, or
   created. Pay special attention to the most recent messages and include full code snippets
   where applicable and include a summary of why this file read or edit is important.
4. Errors and fixes: List all errors that you ran into, and how you fixed them. Pay special
   attention to specific user feedback that you received, especially if the user told you to do
   something differently.
5. Problem Solving: Document problems solved and any ongoing troubleshooting efforts.
6. All user messages: List ALL user messages that are not tool results. These are critical for
   understanding the users' feedback and changing intent.
7. Pending Tasks: Outline any pending tasks that you have explicitly been asked to work on.
8. Current Work: Describe in detail precisely what was being worked on immediately before this
   summary request, paying special attention to the most recent messages from both user and
   assistant. Include file names and code snippets where applicable.
9. Optional Next Step: List the next step that you will take that is related to the most recent
   work you were doing. IMPORTANT: ensure that this step is DIRECTLY in line with the user's
   most recent explicit requests, and the task you were working on immediately before this
   summary request. [...]
                       If there is a next step, include direct quotes from the most recent
   conversation showing exactly what task you were working on and where you left off. This should
   be verbatim to ensure there's no drift in task interpretation.

[example output structure with <analysis> and <summary> XML blocks...]

[If PreCompact hook or /compact instructions provided:]
Additional Instructions:
{customInstructions}

IMPORTANT: Do NOT use any tools. You MUST respond with ONLY the <summary>...</summary> block
as your text output.

Prompt 2: Partial compact (_M4, used with /compact from a message index)

Identical structure to Prompt 1, but scoped to "RECENT portion only":

Your task is to create a detailed summary of the RECENT portion of the conversation — the
messages that follow earlier retained context. The earlier messages are being kept intact and
do NOT need to be summarized. Focus your summary on what was discussed, learned, and
accomplished in the recent messages only.
[...same 9-section structure...]
Please provide your summary based on the RECENT messages only (after the retained earlier
context), following this structure and ensuring precision and thoroughness in your response.

Prompt 3: Sub-agent continuation (Q07)

Used when an in-process sub-agent approaches its context limit — fires before the next turn as a user message:

You have been working on the task described above but have not yet completed it. Write a
continuation summary that will allow you (or another instance of yourself) to resume work
efficiently in a future context window where the conversation history will be replaced with
this summary. Your summary should be structured, concise, and actionable. Include:
1. Task Overview
   The user's core request and success criteria
   Any clarifications or constraints they specified
2. Current State
   What has been completed so far
   Files created, modified, or analyzed (with paths if relevant)
   Key outputs or artifacts produced
3. Important Discoveries
   Technical constraints or requirements uncovered
   Decisions made and their rationale
   Errors encountered and how they were resolved
   What approaches were tried that didn't work (and why)
4. Next Steps
   Specific actions needed to complete the task
   Any blockers or open questions to resolve
   Priority order if multiple steps remain
5. Context to Preserve
   User preferences or style requirements
   Domain-specific details that aren't obvious
   Any promises made to the user
Be concise but complete — err on the side of including information that would prevent duplicate
work or repeated mistakes. Write in a way that enables immediate resumption of the task.
Wrap your summary in <summary></summary> tags.

Continuation message injected after compaction (JQ6)

After compaction, history starts with this user message:

This session is being continued from a previous conversation that ran out of context. The
summary below covers the earlier portion of the conversation.

[<analysis> block reformatted as plain text]
[<summary> block reformatted as plain text]

[If transcript available:]
If you need specific details from before compaction (like exact code snippets, error messages,
or content you generated), read the full transcript at: {transcriptPath}

[If partial compact — some messages kept verbatim:]
Recent messages are preserved verbatim.

[If auto-compact:]
Please continue the conversation from where we left off without asking the user any further
questions. Continue with the last task that you were asked to work on.

Threshold

// Constants
const BcK = 200_000  // default context window for non-extended models
const wk8 = 13_000   // safety buffer
const a5Y = 20_000   // output reservation cap

function B96(model) {
  // Usable window = context - output reservation
  return contextWindow(model) - Math.min(maxOutputTokens(model), 20_000)
}

function fQ6(model) {
  // Auto-compact fires here
  const base = B96(model) - 13_000  // additional safety buffer
  const override = process.env.CLAUDE_AUTOCOMPACT_PCT_OVERRIDE  // 1–100
  if (override) return Math.min(Math.floor(B96(model) * pct/100), base)
  return base
}

Example (Claude Sonnet 3.5: 200k context, 8192 max output):

B96 = 200000 - min(8192, 20000) = 191808
fQ6 = 191808 - 13000 = 178808  ≈ 89.4% of context

Warning display: shown at fQ6 - 20000 (contextWindow − maxOutput − 33k)
Blocking limit: contextWindow - 3000 (absolute hard stop)

Uses actual token count from API response (input_tokens + cache_* + output_tokens).


Full mechanism: history replacement (extract)

Before:

[user msg 1] [assistant 1] [user msg 2] [assistant 2] ... [user msg N]

bG6() flow:

  1. Count tokens (tk(messages))
  2. Run PreCompact hooks → may inject custom instructions into prompt
  3. Check session memory (QP1) — if stored summary exists and fits, skip LLM
  4. Build API request: full history + summary prompt → model (same as conversation)
    • thinkingConfig: { type: "disabled" } — extended thinking turned off
    • maxOutputTokensOverride: 20000
    • Tools: read_file only
  5. Stream response, extract <summary>...</summary> block
  6. Clear readFileState
  7. Re-inject: recently-read files (bM4), plan file (IP1), skills (uM4), plan mode (mM4)
  8. Run session start hooks
  9. Return: { boundaryMarker, summaryMessages, attachments, hookResults }

After (via A66()):

[boundaryMarker: "Conversation compacted"]
[summaryMessage: JQ6(summary, ...)]
[messagesToKeep: verbatim recent msgs — partial compact only]
[attachments: re-injected files/skills/plan]
[hookResults: session start outputs]

Cache sharing (feature flag tengu_compact_cache_prefix): before calling the LLM, tries to reuse a compaction result cached from another session with the same conversation prefix. Falls back on miss.


Microcompact: no-LLM tool result clearing

Function Rg() — runs during message serialization, before each API call.

Constants:

const g3Y = 40_000  // protect this many tokens of recent tool results
const F3Y = 3       // always keep last N tool results intact
const B3Y = 20_000  // minimum savings threshold to bother
const eV8 = 2_000   // estimated tokens per image/document

Trigger: isAboveWarningThreshold AND clearable tool result tokens > 20k

Algorithm:

  1. Find all tool_use/tool_result pairs for eligible tools (bash, read_file, grep, etc.)
  2. Keep last F3Y=3 tool results protected always
  3. Scan backwards: accumulate tool result sizes until > g3Y=40k tokens counted
  4. Everything beyond that 40k window: eligible to clear
  5. If eligible tokens > B3Y=20k: strip them
    • Tool results → "[Tool result cleared]" (or saved to temp file with re-read instructions)
    • Images/documents in user messages → "[image]" / "[document]"
  6. Cleared tool IDs tracked in U96 set (persists across turns)

No LLM call. Purely in-memory message transformation.


Configuration

Settings UI: autoCompactEnabled toggle — "Auto-compact when context is full"

Environment variables:

Variable Effect
DISABLE_COMPACT Disable ALL compaction including /compact command
DISABLE_AUTO_COMPACT Disable auto-compact only; /compact still works
DISABLE_MICROCOMPACT Disable microcompact
CLAUDE_AUTOCOMPACT_PCT_OVERRIDE Float 1–100: trigger at this % of B96 (max: default)
CLAUDE_CODE_MAX_OUTPUT_TOKENS Override model max output tokens
CLAUDE_CODE_BLOCKING_LIMIT_OVERRIDE Override hard blocking limit
CLAUDE_AFTER_LAST_COMPACT Fetch only session logs after last compact point

Edge cases

  1. Compaction of compaction: if the compaction call itself returns "compact" result (overflow during compaction), throws ContextOverflowError
  2. Empty response: throws "Failed to generate conversation summary"
  3. API error response (WO prefix): re-throws with original error
  4. Prompt too long (Mc prefix): throws user-facing "context too large to compact"
  5. Auto-compact failure: silently returns { wasCompacted: false } — session continues with uncleaned history
  6. Microcompact + compaction: both can fire in the same turn; microcompact runs inline during serialization, full compaction runs after
  7. Session memory fallback: if session memory compaction result still exceeds threshold, falls through to LLM compaction
  8. Streaming retry: tengu_compact_streaming_retry flag enables retrying compaction up to k5Y times on stream failure

Summary

Property Value
Mechanism Full history replacement (extract)
Threshold contextWindow - min(maxOutputTokens, 20k) - 13k
Example (Sonnet 3.5) 178,808 / 200,000 ≈ 89.4%
Token source Actual API token count (not estimated)
Configurable Yes — env CLAUDE_AUTOCOMPACT_PCT_OVERRIDE, settings toggle
Prompt 9-section structured <analysis> + <summary> XML
History to LLM Full history, no truncation
Model Same mainLoopModel as conversation
Max output 20,000 tokens (hardcoded override)
Extended thinking Disabled during compaction
Post-compaction Boundary marker + continuation message + re-injected files/skills/plan
Microcompact Separate, no LLM, inline tool result clearing
Hook PreCompact — can inject instructions into summary prompt
Cache sharing Experimental — reuse across sessions with same prefix
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment