Repo: https://github.com/openai/codex
Language: Rust (codex-rs/)
Key files:
core/src/compact.rs— local (non-OpenAI) compaction logiccore/src/compact_remote.rs— remote (OpenAI) compaction logiccore/src/tasks/compact.rs— task dispatchercore/templates/compact/prompt.md— the summarization promptcore/templates/compact/summary_prefix.md— prepended to stored summarycodex-api/src/endpoint/compact.rs— API client for remote endpointprotocol/src/openai_models.rs:276— threshold computationcore/src/codex.rs:4895–5270— trigger logic
Source: core/templates/compact/prompt.md (embedded via include_str!)
You are performing a CONTEXT CHECKPOINT COMPACTION. Create a handoff summary for another LLM that will resume the task.
Include:
- Current progress and key decisions made
- Important context, constraints, or user preferences
- What remains to be done (clear next steps)
- Any critical data, examples, or references needed to continue
Be concise, structured, and focused on helping the next LLM seamlessly continue the work.
Summary prefix — prepended to the model's reply before storing
Source: core/templates/compact/summary_prefix.md
Another language model started to solve this problem and produced a summary of its thinking process. You also have access to the state of the tools that were used by that language model. Use this to build on the work that has already been done and avoid duplicating work. Here is the summary produced by the other language model, use the information in this summary to assist with your own analysis:
This prefix is only used in the local (non-OpenAI) path. It becomes the preamble of the summary user-message injected into the replacement history.
Metric: Cumulative token count (total_usage_tokens) from the last API response.
Formula (protocol/src/openai_models.rs:276-287):
pub fn auto_compact_token_limit(&self) -> Option<i64> {
let context_limit = self
.context_window
.map(|context_window| (context_window * 9) / 10); // 90% of context window
let config_limit = self.auto_compact_token_limit;
if let Some(context_limit) = context_limit {
return Some(
config_limit.map_or(context_limit, |limit| std::cmp::min(limit, context_limit)),
);
}
config_limit
}- Default: 90% of the model's context window (e.g., 180K for a 200K model)
- User override:
model_auto_compact_token_limitinconfig.toml— capped at the 90% ceiling; can only lower it, not raise it
If neither context_window nor model_auto_compact_token_limit is set, auto_compact_token_limit() returns None, and the caller falls back to i64::MAX — meaning compaction never fires automatically.
-
Pre-sampling / pre-turn (
codex.rs:5250-5270): Before processing the new user turn, iftotal_usage_tokens >= auto_compact_limit, compaction runs withInitialContextInjection::DoNotInject. History is replaced, and the next turn reinjects all context fresh. -
Mid-turn (
codex.rs:5122-5150): After a sampling response completes, iftoken_limit_reached && needs_follow_up(model wants to continue tool calling), compaction runs withInitialContextInjection::BeforeLastUserMessage— the last user message is preserved as the final item.
Codex uses full extraction and replacement. The history is completely discarded and rebuilt; it is NOT a piggyback that appends a summary to the existing history.
- Appends
SUMMARIZATION_PROMPTas a new user message to the current history - Streams a standard response request to the same model at the same endpoint
- Extracts the model's last assistant message as the summary text
- Prepends
SUMMARY_PREFIXto getsummary_text - Calls
collect_user_messages()— extracts all raw user messages from history, filtering out previous summaries - Calls
build_compacted_history()— selects the most recent user messages up to 20,000 tokens (most recent first, then reversed) - Replacement history:
[recent_user_msgs..., summary_as_user_message] - Calls
sess.replace_compacted_history(...)— full swap
The summary is stored as a user-role message (not system/assistant/developer).
- Sends full history + base instructions to
POST /v1/responses/compact(a proprietary OpenAI API endpoint) - Gets back
{ "output": Vec<ResponseItem> }— non-streaming JSON - Filters the response: drops
developermessages, drops non-user-content user messages, keepsassistant,userreal messages, andCompactionencrypted items - Calls
sess.replace_compacted_history(...)— full swap CompactedItem.messageis set to""(empty) — no local summary text
Which path is chosen: compact.rs:50-52
pub(crate) fn should_use_remote_compact_task(provider: &ModelProviderInfo) -> bool {
provider.is_openai()
}During compaction, if ContextWindowExceeded fires and turn_input_len > 1:
- Iteratively removes the oldest history item (
history.remove_first_item()) and retries - User sees: "Trimmed N older thread item(s) before compacting so the prompt fits the model context window."
If turn_input_len == 1 (nothing left to trim):
set_total_tokens_full()is called (marks context as full)EventMsg::Erroris emitted and the turn is aborted
For remote compaction (compact_remote.rs:256-283):
trim_function_call_history_to_fit_context_window()runs before the API call- Removes codex-generated items (
FunctionCallOutput,developermessages) from the end of history until the estimated token count fits - On failure, logs full diagnostic info (bytes, tokens, model window) and returns
Err
Local path (compact.rs:170-187):
- Retries up to
provider.stream_max_retries()times with exponential backoff - On each retry: emits "Reconnecting... N/max" warning event
- After all retries exhausted: emits
EventMsg::Error, returnsErr
Remote path (compact_remote.rs:53-63):
- No retry logic — single attempt
- On failure: logs diagnostic dump, emits
EventMsg::Error, returnsErr
In both cases, the caller (run_turn) returns None, aborting the current turn.
maybe_run_previous_model_inline_compact (codex.rs:5279-5315):
- If switching to a model with a smaller context window AND the current token count would exceed the new model's limit AND the model slugs differ
- Compaction runs against the previous model (correct endpoint, context window)
- Uses
InitialContextInjection::DoNotInject
Additionally, when resuming after compaction with a different model, a <model_switch> developer message is re-injected into the replacement history (compact.rs:process_compacted_history_reinjects_model_switch_message test).
Via slash command (TUI) → Op::Compact → CompactTask (tasks/compact.rs).
The app-server exposes thread/compact/start (JSON-RPC). Same routing logic as auto-compact (remote vs local based on provider).
After every manual compact, a WarningEvent fires:
"Heads up: Long threads and multiple compactions can cause the model to be less accurate. Start a new thread when possible to keep threads small and targeted."
Manual compact goes through run_compact_task() (not the run_inline_* variants), which emits a TurnStarted event first.
Not explicitly bounded. The summary text is the model's last assistant message — subject to the model's normal output limits only. However, the replacement history is bounded:
- User messages: up to
COMPACT_USER_MESSAGE_MAX_TOKENS = 20_000tokens total - If the compaction request itself would overflow the context window, oldest items are trimmed before retrying
Each compaction:
- Replaces history with
[user_msgs..., summary_user_msg] - The next compaction operates on that already-compacted history
collect_user_messages()filters out messages starting withSUMMARY_PREFIX, so old summaries are not included in the user message list — only the most recent summary (the last item) survives- The test suite (
compact.rs:multi_auto_compact_*) verifies up to 3+ sequential compactions work correctly
is_summary_message() check (compact.rs:269-271):
pub(crate) fn is_summary_message(message: &str) -> bool {
message.starts_with(format!("{SUMMARY_PREFIX}\n").as_str())
}This controls where fresh system context is re-injected relative to the compacted history:
DoNotInject(pre-turn / manual compact): Clearsreference_context_item. The next regular turn will reinject context fresh from scratch.BeforeLastUserMessage(mid-turn compact): The initial context (cwd, date, permissions, model, etc.) is spliced into the compacted history before the last real user message so the model still sees the user's pending request as the last item.
GhostSnapshot items (used for undo functionality) are preserved separately and appended after the replacement history in both local and remote paths. They are invisible to the model but allow /undo to work post-compaction.
- Endpoint: Same as regular turns (
POST /v1/responsesor provider-equivalent) - Model: Same as conversation model
- Request: Full conversation history +
SUMMARIZATION_PROMPTas final user message - Transport: SSE streaming, same
stream()/drain_to_completed()pipeline - Response: Extracts last assistant message text
- Endpoint:
POST /v1/responses/compact - Model: Same as conversation model (passed in
{ model, input, instructions }) - Request:
{ model: "...", input: [...ResponseItems], instructions: "..." }— non-streaming - Response:
{ "output": [...ResponseItems] }— JSON, not SSE - Headers: Standard auth +
chatgpt-conversation-id+ subagent source header"compact"
| Key | Type | Default | Description |
|---|---|---|---|
compact_prompt |
Option<String> |
None → uses built-in template |
Custom summarization prompt sent to the model |
experimental_compact_prompt_file |
Option<AbsolutePathBuf> |
None |
Load custom prompt from a file (overrides compact_prompt if provided) |
model_auto_compact_token_limit |
Option<i64> |
None → 90% of context_window |
Token threshold for auto-compaction; capped at 90% of context window |
model_context_window |
Option<i64> |
model-dependent | Context window override; affects the 90% default threshold |
Resolution order for compact_prompt (config/mod.rs:2007-2046):
ConfigOverrides.compact_prompt(CLI--compact-promptflag or API override)ConfigToml.compact_prompt(fromconfig.toml)experimental_compact_prompt_file(from profile or config)- Falls back to built-in
SUMMARIZATION_PROMPTconstant
Important: compact_prompt being None does not disable compaction — it just uses the default prompt. Compaction is controlled solely by model_auto_compact_token_limit (or context_window).
| Aspect | Detail |
|---|---|
| Trigger | Token count ≥ 90% of context window (or custom model_auto_compact_token_limit) |
| Mechanism | Full extraction — history replaced, not appended to |
| Prompt delivered as | User message appended to history (local) / proprietary API (remote) |
| Summary stored as | User-role message with SUMMARY_PREFIX preamble |
| User messages preserved | Most recent up to 20,000 tokens |
| Model used | Same as conversation model |
| OpenAI vs others | OpenAI → POST /responses/compact (server-side); others → inline summarization turn |
| On failure | Turn aborted; error event emitted; no partial state written |
| On context overflow during compact | Oldest items trimmed iteratively until it fits |
| User-triggerable | Yes, via /compact slash command |
| Multiple compactions | Supported; old summaries are excluded from next compaction's user message list |