Skip to content

Instantly share code, notes, and snippets.

@sam-saffron-jarvis
Created March 4, 2026 20:33
Show Gist options
  • Select an option

  • Save sam-saffron-jarvis/30403c1bc5682bf9f69fa00933aad815 to your computer and use it in GitHub Desktop.

Select an option

Save sam-saffron-jarvis/30403c1bc5682bf9f69fa00933aad815 to your computer and use it in GitHub Desktop.
Codex compression/compaction: prompts and full analysis

Codex Compression (Compaction) — Deep Analysis

Repo: https://github.com/openai/codex
Language: Rust (codex-rs/)
Key files:

  • core/src/compact.rs — local (non-OpenAI) compaction logic
  • core/src/compact_remote.rs — remote (OpenAI) compaction logic
  • core/src/tasks/compact.rs — task dispatcher
  • core/templates/compact/prompt.md — the summarization prompt
  • core/templates/compact/summary_prefix.md — prepended to stored summary
  • codex-api/src/endpoint/compact.rs — API client for remote endpoint
  • protocol/src/openai_models.rs:276 — threshold computation
  • core/src/codex.rs:4895–5270 — trigger logic

1. Compression Prompt

Source: core/templates/compact/prompt.md (embedded via include_str!)

You are performing a CONTEXT CHECKPOINT COMPACTION. Create a handoff summary for another LLM that will resume the task.

Include:
- Current progress and key decisions made
- Important context, constraints, or user preferences
- What remains to be done (clear next steps)
- Any critical data, examples, or references needed to continue

Be concise, structured, and focused on helping the next LLM seamlessly continue the work.

Summary prefix — prepended to the model's reply before storing
Source: core/templates/compact/summary_prefix.md

Another language model started to solve this problem and produced a summary of its thinking process. You also have access to the state of the tools that were used by that language model. Use this to build on the work that has already been done and avoid duplicating work. Here is the summary produced by the other language model, use the information in this summary to assist with your own analysis:

This prefix is only used in the local (non-OpenAI) path. It becomes the preamble of the summary user-message injected into the replacement history.


2. Threshold — What Triggers Compaction

Metric: Cumulative token count (total_usage_tokens) from the last API response.
Formula (protocol/src/openai_models.rs:276-287):

pub fn auto_compact_token_limit(&self) -> Option<i64> {
    let context_limit = self
        .context_window
        .map(|context_window| (context_window * 9) / 10);  // 90% of context window
    let config_limit = self.auto_compact_token_limit;
    if let Some(context_limit) = context_limit {
        return Some(
            config_limit.map_or(context_limit, |limit| std::cmp::min(limit, context_limit)),
        );
    }
    config_limit
}
  • Default: 90% of the model's context window (e.g., 180K for a 200K model)
  • User override: model_auto_compact_token_limit in config.toml — capped at the 90% ceiling; can only lower it, not raise it

If neither context_window nor model_auto_compact_token_limit is set, auto_compact_token_limit() returns None, and the caller falls back to i64::MAX — meaning compaction never fires automatically.

Two Trigger Points

  1. Pre-sampling / pre-turn (codex.rs:5250-5270): Before processing the new user turn, if total_usage_tokens >= auto_compact_limit, compaction runs with InitialContextInjection::DoNotInject. History is replaced, and the next turn reinjects all context fresh.

  2. Mid-turn (codex.rs:5122-5150): After a sampling response completes, if token_limit_reached && needs_follow_up (model wants to continue tool calling), compaction runs with InitialContextInjection::BeforeLastUserMessage — the last user message is preserved as the final item.


3. Mechanism — Extract, Not Piggyback

Codex uses full extraction and replacement. The history is completely discarded and rebuilt; it is NOT a piggyback that appends a summary to the existing history.

Local path (non-OpenAI providers) — compact.rs

  1. Appends SUMMARIZATION_PROMPT as a new user message to the current history
  2. Streams a standard response request to the same model at the same endpoint
  3. Extracts the model's last assistant message as the summary text
  4. Prepends SUMMARY_PREFIX to get summary_text
  5. Calls collect_user_messages() — extracts all raw user messages from history, filtering out previous summaries
  6. Calls build_compacted_history() — selects the most recent user messages up to 20,000 tokens (most recent first, then reversed)
  7. Replacement history: [recent_user_msgs..., summary_as_user_message]
  8. Calls sess.replace_compacted_history(...) — full swap

The summary is stored as a user-role message (not system/assistant/developer).

Remote path (OpenAI providers) — compact_remote.rs

  1. Sends full history + base instructions to POST /v1/responses/compact (a proprietary OpenAI API endpoint)
  2. Gets back { "output": Vec<ResponseItem> } — non-streaming JSON
  3. Filters the response: drops developer messages, drops non-user-content user messages, keeps assistant, user real messages, and Compaction encrypted items
  4. Calls sess.replace_compacted_history(...) — full swap
  5. CompactedItem.message is set to "" (empty) — no local summary text

Which path is chosen: compact.rs:50-52

pub(crate) fn should_use_remote_compact_task(provider: &ModelProviderInfo) -> bool {
    provider.is_openai()
}

4. Edge Cases

First turn is extremely long (> context window)

During compaction, if ContextWindowExceeded fires and turn_input_len > 1:

  • Iteratively removes the oldest history item (history.remove_first_item()) and retries
  • User sees: "Trimmed N older thread item(s) before compacting so the prompt fits the model context window."

If turn_input_len == 1 (nothing left to trim):

  • set_total_tokens_full() is called (marks context as full)
  • EventMsg::Error is emitted and the turn is aborted

For remote compaction (compact_remote.rs:256-283):

  • trim_function_call_history_to_fit_context_window() runs before the API call
  • Removes codex-generated items (FunctionCallOutput, developer messages) from the end of history until the estimated token count fits
  • On failure, logs full diagnostic info (bytes, tokens, model window) and returns Err

Compaction fails (API error)

Local path (compact.rs:170-187):

  • Retries up to provider.stream_max_retries() times with exponential backoff
  • On each retry: emits "Reconnecting... N/max" warning event
  • After all retries exhausted: emits EventMsg::Error, returns Err

Remote path (compact_remote.rs:53-63):

  • No retry logic — single attempt
  • On failure: logs diagnostic dump, emits EventMsg::Error, returns Err

In both cases, the caller (run_turn) returns None, aborting the current turn.

Model switched mid-conversation

maybe_run_previous_model_inline_compact (codex.rs:5279-5315):

  • If switching to a model with a smaller context window AND the current token count would exceed the new model's limit AND the model slugs differ
  • Compaction runs against the previous model (correct endpoint, context window)
  • Uses InitialContextInjection::DoNotInject

Additionally, when resuming after compaction with a different model, a <model_switch> developer message is re-injected into the replacement history (compact.rs:process_compacted_history_reinjects_model_switch_message test).

User explicitly triggers compaction

Via slash command (TUI) → Op::CompactCompactTask (tasks/compact.rs).

The app-server exposes thread/compact/start (JSON-RPC). Same routing logic as auto-compact (remote vs local based on provider).

After every manual compact, a WarningEvent fires:

"Heads up: Long threads and multiple compactions can cause the model to be less accurate. Start a new thread when possible to keep threads small and targeted."

Manual compact goes through run_compact_task() (not the run_inline_* variants), which emits a TurnStarted event first.

Summary itself is too long

Not explicitly bounded. The summary text is the model's last assistant message — subject to the model's normal output limits only. However, the replacement history is bounded:

  • User messages: up to COMPACT_USER_MESSAGE_MAX_TOKENS = 20_000 tokens total
  • If the compaction request itself would overflow the context window, oldest items are trimmed before retrying

Multiple compactions in sequence

Each compaction:

  1. Replaces history with [user_msgs..., summary_user_msg]
  2. The next compaction operates on that already-compacted history
  3. collect_user_messages() filters out messages starting with SUMMARY_PREFIX, so old summaries are not included in the user message list — only the most recent summary (the last item) survives
  4. The test suite (compact.rs:multi_auto_compact_*) verifies up to 3+ sequential compactions work correctly

is_summary_message() check (compact.rs:269-271):

pub(crate) fn is_summary_message(message: &str) -> bool {
    message.starts_with(format!("{SUMMARY_PREFIX}\n").as_str())
}

InitialContextInjection modes

This controls where fresh system context is re-injected relative to the compacted history:

  • DoNotInject (pre-turn / manual compact): Clears reference_context_item. The next regular turn will reinject context fresh from scratch.
  • BeforeLastUserMessage (mid-turn compact): The initial context (cwd, date, permissions, model, etc.) is spliced into the compacted history before the last real user message so the model still sees the user's pending request as the last item.

Ghost snapshots

GhostSnapshot items (used for undo functionality) are preserved separately and appended after the replacement history in both local and remote paths. They are invisible to the model but allow /undo to work post-compaction.


5. API Structure

Local (non-OpenAI)

  • Endpoint: Same as regular turns (POST /v1/responses or provider-equivalent)
  • Model: Same as conversation model
  • Request: Full conversation history + SUMMARIZATION_PROMPT as final user message
  • Transport: SSE streaming, same stream() / drain_to_completed() pipeline
  • Response: Extracts last assistant message text

Remote (OpenAI)

  • Endpoint: POST /v1/responses/compact
  • Model: Same as conversation model (passed in { model, input, instructions })
  • Request: { model: "...", input: [...ResponseItems], instructions: "..." } — non-streaming
  • Response: { "output": [...ResponseItems] } — JSON, not SSE
  • Headers: Standard auth + chatgpt-conversation-id + subagent source header "compact"

6. Config Keys

Key Type Default Description
compact_prompt Option<String> None → uses built-in template Custom summarization prompt sent to the model
experimental_compact_prompt_file Option<AbsolutePathBuf> None Load custom prompt from a file (overrides compact_prompt if provided)
model_auto_compact_token_limit Option<i64> None → 90% of context_window Token threshold for auto-compaction; capped at 90% of context window
model_context_window Option<i64> model-dependent Context window override; affects the 90% default threshold

Resolution order for compact_prompt (config/mod.rs:2007-2046):

  1. ConfigOverrides.compact_prompt (CLI --compact-prompt flag or API override)
  2. ConfigToml.compact_prompt (from config.toml)
  3. experimental_compact_prompt_file (from profile or config)
  4. Falls back to built-in SUMMARIZATION_PROMPT constant

Important: compact_prompt being None does not disable compaction — it just uses the default prompt. Compaction is controlled solely by model_auto_compact_token_limit (or context_window).


Summary

Aspect Detail
Trigger Token count ≥ 90% of context window (or custom model_auto_compact_token_limit)
Mechanism Full extraction — history replaced, not appended to
Prompt delivered as User message appended to history (local) / proprietary API (remote)
Summary stored as User-role message with SUMMARY_PREFIX preamble
User messages preserved Most recent up to 20,000 tokens
Model used Same as conversation model
OpenAI vs others OpenAI → POST /responses/compact (server-side); others → inline summarization turn
On failure Turn aborted; error event emitted; no partial state written
On context overflow during compact Oldest items trimmed iteratively until it fits
User-triggerable Yes, via /compact slash command
Multiple compactions Supported; old summaries are excluded from next compaction's user message list

Codex Compression Prompt

Source: codex-rs/core/templates/compact/prompt.md Embedded constant: SUMMARIZATION_PROMPT in core/src/compact.rs:31


Summarization Prompt (sent to model)

You are performing a CONTEXT CHECKPOINT COMPACTION. Create a handoff summary for another LLM that will resume the task.

Include:
- Current progress and key decisions made
- Important context, constraints, or user preferences
- What remains to be done (clear next steps)
- Any critical data, examples, or references needed to continue

Be concise, structured, and focused on helping the next LLM seamlessly continue the work.

Summary Prefix (prepended to model response before storing)

Source: codex-rs/core/templates/compact/summary_prefix.md Embedded constant: SUMMARY_PREFIX in core/src/compact.rs:32

Another language model started to solve this problem and produced a summary of its thinking process. You also have access to the state of the tools that were used by that language model. Use this to build on the work that has already been done and avoid duplicating work. Here is the summary produced by the other language model, use the information in this summary to assist with your own analysis:

How These Are Used

  1. Summarization prompt is appended as a new user message to the existing conversation history, then sent to the model for a streaming response.

  2. The model's last assistant reply is extracted as summary_suffix.

  3. The stored summary text is: "{SUMMARY_PREFIX}\n{summary_suffix}"

  4. This combined text is injected into the replacement history as a user-role message — it is the last item in the compacted history.

  5. When the next turn starts, the SUMMARY_PREFIX text tells the model it is picking up from a prior compaction.

The is_summary_message() function uses this prefix to identify summary messages and exclude them from subsequent compaction's user-message collection:

pub(crate) fn is_summary_message(message: &str) -> bool {
    message.starts_with(format!("{SUMMARY_PREFIX}\n").as_str())
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment