Skip to content

Instantly share code, notes, and snippets.

@sam-saffron-jarvis
Created March 4, 2026 22:36
Show Gist options
  • Select an option

  • Save sam-saffron-jarvis/8128d9838cc947c17c531f81c3619e1a to your computer and use it in GitHub Desktop.

Select an option

Save sam-saffron-jarvis/8128d9838cc947c17c531f81c3619e1a to your computer and use it in GitHub Desktop.
Pi Coding Agent — Compaction/Compression Deep Dive

Pi — Compression/Compaction Analysis

Repo: https://github.com/badlogic/pi-mono
Commit: f430dce
Language: TypeScript (terminal coding agent)


Summary

Pi calls its compression system "compaction". It's a full extract mechanism with incremental summary updating. Old messages are discarded; new effective context = compaction summary message + recent messages. The summary is written to a persistent session file (session.jsonl), and the agent is reloaded from that checkpoint. A key differentiator: Pi uses two different prompts — an initial prompt for first compaction and an update prompt for subsequent compactions that explicitly instructs the model to merge with the previous summary.


Mechanism: Extract with Iterative Summary Update

src/core/compaction/compaction.tscompact(), prepareCompaction(), generateSummary()

Full extract — not a piggyback. After compaction:

  • Messages before firstKeptEntryId are removed from the in-memory context
  • A CompactionEntry (with the summary text) is written to session.jsonl
  • The session is reloaded: context = [compactionSummaryMessage] + [kept recent messages]

Iterative update: If a previous compaction exists, generateSummary() receives previousSummary and uses the UPDATE_SUMMARIZATION_PROMPT which explicitly instructs the model to merge the prior summary with the new messages, not start fresh. This avoids re-summarising the full history each time.


Threshold

src/core/compaction/compaction.tsshouldCompact()
src/core/settings-manager.tsgetCompactionSettings()

export function shouldCompact(contextTokens: number, contextWindow: number, settings: CompactionSettings): boolean {
    if (!settings.enabled) return false;
    return contextTokens > contextWindow - settings.reserveTokens;
}

Compaction fires when: contextTokens > contextWindow - reserveTokens

Defaults:

  • reserveTokens = 16384 — reserved headroom (for the system prompt + LLM response)
  • keepRecentTokens = 20000 — how much recent context to preserve after compaction
  • enabled = true

So with a 200k-token model (e.g. Claude Sonnet): fires at 200k - 16k = 184k tokens ≈ 92% fill.

Token estimation

Pi uses the actual reported API usage from the last assistant message (calculateContextTokens(usage)) rather than counting tokens locally. If no usage is available (e.g. no assistant turn yet), it falls back to a chars / 4 character heuristic per message block.

Configurability

All three settings are user-configurable in settings.jsonl:

{
  "compaction": {
    "enabled": true,       // master toggle
    "reserveTokens": 16384, // headroom to leave at top
    "keepRecentTokens": 20000 // how much recent history to keep
  }
}

Via /settings in the UI or direct file edit.


Trigger Points

Two trigger points in _checkCompaction():

1. Post-turn (threshold)

agent-session.ts ~line 447

After every agent_end event, if the last assistant message has valid usage data:

contextTokens = usage.totalTokens || (input + output + cacheRead + cacheWrite)
if contextTokens > contextWindow - reserveTokens → auto-compact

2. Pre-turn (overflow recovery)

agent-session.ts ~line 888

Before sending each user message, the last assistant message is re-checked. If it was a context overflow error, compaction is triggered immediately with willRetry = true (the failed request is retried after compaction).

Single overflow recovery attempt: _overflowRecoveryAttempted flag prevents infinite loops. If a second overflow occurs after recovery, the user is told to reduce context or switch model.

Model-change guard: Overflow errors from a different model (e.g. user switched from Opus to Sonnet) are ignored — the error's provider and model fields must match the current model.

Post-compaction guard: If the overflow error timestamp predates the latest compaction, it's ignored (the error is in the preserved region and shouldn't trigger a new compaction).

3. Manual — /compact slash command

src/core/slash-commands.ts

User can trigger compaction at any time. Accepts optional customInstructions string appended to the summarization prompt.


Cut Point Selection

src/core/compaction/compaction.tsfindCutPoint(), findValidCutPoints()

After deciding to compact, Pi must choose where to cut: which messages to summarize vs. which to keep.

Goal: keep approximately keepRecentTokens (default 20k tokens) of recent context.

Algorithm:

  1. Collect all valid cut points (indices of user, assistant, custom, bashExecution, branchSummary, compactionSummary messages — never cut at toolResult since it must follow its tool call)
  2. Walk backward from the end, accumulating estimated tokens
  3. Stop when accumulated tokens exceed keepRecentTokens → that's the cut point

Split-turn handling: If the natural cut falls in the middle of a turn (the user message starting that turn would be excluded), Pi detects isSplitTurn = true and:

  • Generates a separate turn-prefix summary for the excluded portion of that turn
  • Merges it into the main summary: {main_summary}\n\n---\n\n**Turn Context (split turn):**\n\n{prefix_summary}

File Tracking

src/core/compaction/utils.tsextractFileOpsFromMessage(), computeFileLists()

Pi tracks all file operations across the session and appends them to the summary:

  • read: files passed to the read tool
  • written: files passed to the write tool
  • edited: files passed to the edit tool

On each compaction, file ops are accumulated from:

  1. The previous compaction's details.readFiles / details.modifiedFiles (carried forward)
  2. Tool calls in the messages being summarized

Final lists are deduplicated: modifiedFiles = written ∪ edited, readFiles = read \ modifiedFiles.

Appended to summary as XML:

<read-files>
src/foo.ts
src/bar.ts
</read-files>

<modified-files>
src/baz.ts
</modified-files>

This gives the next model awareness of file history even after the actual tool calls are gone.


Extension Hook

agent-session.tssession_before_compact event

Pi has an extension system. Before compaction runs, the session_before_compact event is emitted. Extensions can return a compaction result to override the built-in compaction entirely. The built-in compact() function only runs if no extension provides a result. This allows custom summarization strategies (e.g. structured compaction for specific workflows).


Branch Summarization

src/core/compaction/branch-summarization.ts

Separate from session compaction. When a user creates a new "branch" in the conversation (Pi has a branching UI), a branch summary is generated for the diverging point. Same prompt structure (uses SUMMARIZATION_SYSTEM_PROMPT), but focused on capturing the branch context.


Summary Message in Context

After compaction, the session is reloaded. The effective context seen by the model:

[system prompt]
[CompactionSummaryMessage]  ← role: "user", contains summary text + <read-files> + <modified-files>
[kept recent messages]      ← last ~keepRecentTokens tokens

The CompactionSummaryMessage is a special message type rendered distinctly in the TUI (via compaction-summary-message.ts component).


Edge Cases

Scenario Handling
Already compacted (last entry is compaction) prepareCompaction() returns undefined → skip
No valid cut point found Returns undefined → no compaction
Context overflow from different model Ignored: sameModel check on provider + model fields
Overflow after recent compaction Ignored: error timestamp < compaction timestamp
Second overflow after recovery Aborts with user message: "try reducing context or switching to a larger-context model"
Split turn (turn too large for keepRecentTokens) Separate turn-prefix summary generated, merged into main summary
customInstructions set Appended to whichever prompt is used: \n\nAdditional focus: {instructions}
Extension overrides compaction session_before_compact handler provides custom CompactionResult
Reasoning model completeSimple called with reasoning: "high" for the summarization API call

Key Files

File Purpose
src/core/compaction/compaction.ts Core: shouldCompact(), prepareCompaction(), compact(), generateSummary(), all prompts
src/core/compaction/utils.ts SUMMARIZATION_SYSTEM_PROMPT, serializeConversation(), file op tracking
src/core/compaction/branch-summarization.ts Branch-point summary (separate from session compaction)
src/core/agent-session.ts _checkCompaction() — trigger logic, overflow handling, extension hooks
src/core/settings-manager.ts getCompactionSettings()reserveTokens, keepRecentTokens, enabled
src/modes/interactive/components/compaction-summary-message.ts TUI rendering of compaction summary

Pi Compression Prompts

Repo: https://github.com/badlogic/pi-mono
Commit: f430dce
Key files:

  • packages/coding-agent/src/core/compaction/compaction.ts — main logic, prompts
  • packages/coding-agent/src/core/compaction/utils.tsSUMMARIZATION_SYSTEM_PROMPT, serialization
  • packages/coding-agent/src/core/agent-session.ts_checkCompaction(), trigger logic
  • packages/coding-agent/src/core/settings-manager.ts — threshold config

Three-Layer Prompt Architecture

Pi makes a fresh LLM call for compaction. The summarization request is sent as a single user message containing: the serialized conversation, (optionally) the previous summary, and the condensation instructions.


Layer 1 — System prompt (SUMMARIZATION_SYSTEM_PROMPT)

src/core/compaction/utils.ts

You are a context summarization assistant. Your task is to read a conversation between a user and an AI coding assistant, then produce a structured summary following the exact format specified.

Do NOT continue the conversation. Do NOT respond to any questions in the conversation. ONLY output the structured summary.

Layer 2a — Initial compaction prompt (SUMMARIZATION_PROMPT)

src/core/compaction/compaction.ts, used when there is no previous summary.

The full user message sent for summarization:

<conversation>
[User]: ...
[Assistant thinking]: ...
[Assistant]: ...
[Assistant tool calls]: toolName(arg=val)
[Tool result]: ...
... (full serialized conversation)
</conversation>

The messages above are a conversation to summarize. Create a structured context checkpoint summary that another LLM will use to continue the work.

Use this EXACT format:

## Goal
[What is the user trying to accomplish? Can be multiple items if the session covers different tasks.]

## Constraints & Preferences
- [Any constraints, preferences, or requirements mentioned by user]
- [Or "(none)" if none were mentioned]

## Progress
### Done
- [x] [Completed tasks/changes]

### In Progress
- [ ] [Current work]

### Blocked
- [Issues preventing progress, if any]

## Key Decisions
- **[Decision]**: [Brief rationale]

## Next Steps
1. [Ordered list of what should happen next]

## Critical Context
- [Any data, examples, or references needed to continue]
- [Or "(none)" if not applicable]

Keep each section concise. Preserve exact file paths, function names, and error messages.

Optional suffix (when customInstructions set): \n\nAdditional focus: {customInstructions}


Layer 2b — Incremental update prompt (UPDATE_SUMMARIZATION_PROMPT)

src/core/compaction/compaction.ts, used when a previous summary exists.

<conversation>
[User]: ...
... (only NEW messages since last compaction)
</conversation>

<previous-summary>
{previous_summary_text}
</previous-summary>

The messages above are NEW conversation messages to incorporate into the existing summary provided in <previous-summary> tags.

Update the existing structured summary with new information. RULES:
- PRESERVE all existing information from the previous summary
- ADD new progress, decisions, and context from the new messages
- UPDATE the Progress section: move items from "In Progress" to "Done" when completed
- UPDATE "Next Steps" based on what was accomplished
- PRESERVE exact file paths, function names, and error messages
- If something is no longer relevant, you may remove it

Use this EXACT format:

## Goal
[Preserve existing goals, add new ones if the task expanded]

## Constraints & Preferences
- [Preserve existing, add new ones discovered]

## Progress
### Done
- [x] [Include previously done items AND newly completed items]

### In Progress
- [ ] [Current work - update based on progress]

### Blocked
- [Current blockers - remove if resolved]

## Key Decisions
- **[Decision]**: [Brief rationale] (preserve all previous, add new)

## Next Steps
1. [Update based on current state]

## Critical Context
- [Preserve important context, add new if needed]

Keep each section concise. Preserve exact file paths, function names, and error messages.

Layer 3 — Turn-prefix summary (TURN_PREFIX_SUMMARIZATION_PROMPT)

src/core/compaction/compaction.ts, used only when a split-turn compaction occurs.

When the cut point falls in the middle of a turn (the turn is too large to fit in keepRecentTokens), a separate smaller summary is generated for the prefix of that turn, then merged into the main summary:

<conversation>
[prefix messages of the oversized turn]
</conversation>

This is the PREFIX of a turn that was too large to keep. The SUFFIX (recent work) is retained.

Summarize the prefix to provide context for the retained suffix:

## Original Request
[What did the user ask for in this turn?]

## Early Progress
- [Key decisions and work done in the prefix]

## Context for Suffix
- [Information needed to understand the retained recent work]

Be concise. Focus on what's needed to understand the kept suffix.

Max tokens budget for turn-prefix summary: 0.5 × reserveTokens (half the normal budget).


Summary Output Format

## Goal
...

## Constraints & Preferences
- ...

## Progress
### Done
- [x] ...

### In Progress
- [ ] ...

### Blocked
...

## Key Decisions
- **Decision**: rationale

## Next Steps
1. ...

## Critical Context
- ...

<read-files>
path/to/file1.ts
path/to/file2.ts
</read-files>

<modified-files>
path/to/modified.ts
</modified-files>

The <read-files> and <modified-files> XML blocks are appended automatically by the harness (not generated by the LLM) from tracked tool call history.


Conversation Serialization

Before being sent to the summarization LLM, the conversation is flattened to a plain-text format (serializeConversation()):

[User]: user message text

[Assistant thinking]: extended thinking text

[Assistant]: response text

[Assistant tool calls]: toolName(arg1="val1", arg2="val2")

[Tool result]: tool output text

Images and other non-text content are dropped. This prevents the summarizing model from treating the content as a live conversation to continue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment