Repo: https://github.com/badlogic/pi-mono
Commit: f430dce
Language: TypeScript (terminal coding agent)
Pi calls its compression system "compaction". It's a full extract mechanism with incremental summary updating. Old messages are discarded; new effective context = compaction summary message + recent messages. The summary is written to a persistent session file (session.jsonl), and the agent is reloaded from that checkpoint. A key differentiator: Pi uses two different prompts — an initial prompt for first compaction and an update prompt for subsequent compactions that explicitly instructs the model to merge with the previous summary.
src/core/compaction/compaction.ts—compact(),prepareCompaction(),generateSummary()
Full extract — not a piggyback. After compaction:
- Messages before
firstKeptEntryIdare removed from the in-memory context - A
CompactionEntry(with the summary text) is written tosession.jsonl - The session is reloaded: context =
[compactionSummaryMessage] + [kept recent messages]
Iterative update: If a previous compaction exists, generateSummary() receives previousSummary and uses the UPDATE_SUMMARIZATION_PROMPT which explicitly instructs the model to merge the prior summary with the new messages, not start fresh. This avoids re-summarising the full history each time.
src/core/compaction/compaction.ts—shouldCompact()
src/core/settings-manager.ts—getCompactionSettings()
export function shouldCompact(contextTokens: number, contextWindow: number, settings: CompactionSettings): boolean {
if (!settings.enabled) return false;
return contextTokens > contextWindow - settings.reserveTokens;
}Compaction fires when: contextTokens > contextWindow - reserveTokens
Defaults:
reserveTokens = 16384— reserved headroom (for the system prompt + LLM response)keepRecentTokens = 20000— how much recent context to preserve after compactionenabled = true
So with a 200k-token model (e.g. Claude Sonnet): fires at 200k - 16k = 184k tokens ≈ 92% fill.
Pi uses the actual reported API usage from the last assistant message (calculateContextTokens(usage)) rather than counting tokens locally. If no usage is available (e.g. no assistant turn yet), it falls back to a chars / 4 character heuristic per message block.
All three settings are user-configurable in settings.jsonl:
Via /settings in the UI or direct file edit.
Two trigger points in _checkCompaction():
agent-session.ts~line 447
After every agent_end event, if the last assistant message has valid usage data:
contextTokens = usage.totalTokens || (input + output + cacheRead + cacheWrite)
if contextTokens > contextWindow - reserveTokens → auto-compact
agent-session.ts~line 888
Before sending each user message, the last assistant message is re-checked. If it was a context overflow error, compaction is triggered immediately with willRetry = true (the failed request is retried after compaction).
Single overflow recovery attempt: _overflowRecoveryAttempted flag prevents infinite loops. If a second overflow occurs after recovery, the user is told to reduce context or switch model.
Model-change guard: Overflow errors from a different model (e.g. user switched from Opus to Sonnet) are ignored — the error's provider and model fields must match the current model.
Post-compaction guard: If the overflow error timestamp predates the latest compaction, it's ignored (the error is in the preserved region and shouldn't trigger a new compaction).
src/core/slash-commands.ts
User can trigger compaction at any time. Accepts optional customInstructions string appended to the summarization prompt.
src/core/compaction/compaction.ts—findCutPoint(),findValidCutPoints()
After deciding to compact, Pi must choose where to cut: which messages to summarize vs. which to keep.
Goal: keep approximately keepRecentTokens (default 20k tokens) of recent context.
Algorithm:
- Collect all valid cut points (indices of
user,assistant,custom,bashExecution,branchSummary,compactionSummarymessages — never cut attoolResultsince it must follow its tool call) - Walk backward from the end, accumulating estimated tokens
- Stop when accumulated tokens exceed
keepRecentTokens→ that's the cut point
Split-turn handling: If the natural cut falls in the middle of a turn (the user message starting that turn would be excluded), Pi detects isSplitTurn = true and:
- Generates a separate turn-prefix summary for the excluded portion of that turn
- Merges it into the main summary:
{main_summary}\n\n---\n\n**Turn Context (split turn):**\n\n{prefix_summary}
src/core/compaction/utils.ts—extractFileOpsFromMessage(),computeFileLists()
Pi tracks all file operations across the session and appends them to the summary:
read: files passed to thereadtoolwritten: files passed to thewritetooledited: files passed to theedittool
On each compaction, file ops are accumulated from:
- The previous compaction's
details.readFiles/details.modifiedFiles(carried forward) - Tool calls in the messages being summarized
Final lists are deduplicated: modifiedFiles = written ∪ edited, readFiles = read \ modifiedFiles.
Appended to summary as XML:
<read-files>
src/foo.ts
src/bar.ts
</read-files>
<modified-files>
src/baz.ts
</modified-files>This gives the next model awareness of file history even after the actual tool calls are gone.
agent-session.ts—session_before_compactevent
Pi has an extension system. Before compaction runs, the session_before_compact event is emitted. Extensions can return a compaction result to override the built-in compaction entirely. The built-in compact() function only runs if no extension provides a result. This allows custom summarization strategies (e.g. structured compaction for specific workflows).
src/core/compaction/branch-summarization.ts
Separate from session compaction. When a user creates a new "branch" in the conversation (Pi has a branching UI), a branch summary is generated for the diverging point. Same prompt structure (uses SUMMARIZATION_SYSTEM_PROMPT), but focused on capturing the branch context.
After compaction, the session is reloaded. The effective context seen by the model:
[system prompt]
[CompactionSummaryMessage] ← role: "user", contains summary text + <read-files> + <modified-files>
[kept recent messages] ← last ~keepRecentTokens tokens
The CompactionSummaryMessage is a special message type rendered distinctly in the TUI (via compaction-summary-message.ts component).
| Scenario | Handling |
|---|---|
| Already compacted (last entry is compaction) | prepareCompaction() returns undefined → skip |
| No valid cut point found | Returns undefined → no compaction |
| Context overflow from different model | Ignored: sameModel check on provider + model fields |
| Overflow after recent compaction | Ignored: error timestamp < compaction timestamp |
| Second overflow after recovery | Aborts with user message: "try reducing context or switching to a larger-context model" |
Split turn (turn too large for keepRecentTokens) |
Separate turn-prefix summary generated, merged into main summary |
customInstructions set |
Appended to whichever prompt is used: \n\nAdditional focus: {instructions} |
| Extension overrides compaction | session_before_compact handler provides custom CompactionResult |
| Reasoning model | completeSimple called with reasoning: "high" for the summarization API call |
| File | Purpose |
|---|---|
src/core/compaction/compaction.ts |
Core: shouldCompact(), prepareCompaction(), compact(), generateSummary(), all prompts |
src/core/compaction/utils.ts |
SUMMARIZATION_SYSTEM_PROMPT, serializeConversation(), file op tracking |
src/core/compaction/branch-summarization.ts |
Branch-point summary (separate from session compaction) |
src/core/agent-session.ts |
_checkCompaction() — trigger logic, overflow handling, extension hooks |
src/core/settings-manager.ts |
getCompactionSettings() — reserveTokens, keepRecentTokens, enabled |
src/modes/interactive/components/compaction-summary-message.ts |
TUI rendering of compaction summary |
{ "compaction": { "enabled": true, // master toggle "reserveTokens": 16384, // headroom to leave at top "keepRecentTokens": 20000 // how much recent history to keep } }