Skip to content

Instantly share code, notes, and snippets.

@sam-saffron-jarvis
Created March 4, 2026 22:29
Show Gist options
  • Select an option

  • Save sam-saffron-jarvis/0f5e9b76942678019f15bdd1bd60bbb6 to your computer and use it in GitHub Desktop.

Select an option

Save sam-saffron-jarvis/0f5e9b76942678019f15bdd1bd60bbb6 to your computer and use it in GitHub Desktop.
OpenHands — Condensation/Compression Deep Dive

OpenHands — Compression/Condensation Analysis

Repo: https://github.com/All-Hands-AI/OpenHands
Commit: bf769d1
Language: Python


Summary

OpenHands has the most architecturally distinct compression system of any harness studied. Rather than operating on a chat message history, it operates on an event store — everything is an Event (actions, observations, tool results, agent thoughts) with a persistent numeric ID. Compression is handled by a plugin-based condenser system with 9 pluggable strategies, three of which use LLM calls. The default is a no-LLM window condenser.

The agent itself can also request condensation by calling a tool, allowing the model to self-regulate.


Architecture: Event Store + View

Events

Every step in an OpenHands session produces events: MessageAction, CmdRunAction, BrowserOutputObservation, CondensationAction, etc. All events are persisted with sequential integer IDs.

View

A View is a filtered list of events computed from the full event store on every agent step:

View.from_events(all_events)

This replays all CondensationAction events in order, accumulating a set of forgotten_event_ids. Events in that set are excluded from the view. If the most recent CondensationAction has a summary, an AgentCondensationObservation is inserted at summary_offset (default: 1, after the system message).

Condenser interface

class Condenser(ABC):
    def condense(self, view: View) -> View | Condensation: ...

Returns either:

  • A new View (filtered in-place, e.g. masking condensers)
  • A Condensation wrapping a CondensationAction to be added to the event store

Agent loop

On each agent step (codeact_agent.py):

match self.condenser.condensed_history(state):
    case View() as events:
        condensed_history = events
    case Condensation() as condensation:
        return condensation.action  # ← returned as the agent's "action" this turn

If a Condensation is returned, the agent emits CondensationAction instead of doing real work. On the next step, View.from_events applies it, and the agent proceeds with the condensed view.


Mechanism: Event Erasure (Not Piggyback or Extract)

This is neither piggyback nor extract in the traditional sense. The mechanism is:

  1. CondensationAction(forgotten_events_start_id=X, forgotten_events_end_id=Y, summary=..., summary_offset=K) is persisted to the event store
  2. On all future steps, View.from_events excludes event IDs in [X, Y] (and the CondensationAction itself)
  3. If summary is set, an AgentCondensationObservation(content=summary) is inserted at position K

Non-destructive: Events are never deleted from the persistent store. The CondensationAction is the only record needed to reconstruct which events are hidden.


Condenser Strategies (9 total)

Non-LLM (no API call)

Condenser Config type Trigger Behavior
NoOpCondenser noop Never Returns view unchanged
ObservationMaskingCondenser observation_masking Every step Replaces observations outside attention_window (default: 5) with <MASKED> in-place. View mutation only.
BrowserOutputCondenser browser_output Every step Replaces old BrowserOutputObservation events outside attention_window (default: 1) with "Visited URL {url}\nContent omitted". Keeps only the most recent browser screenshot/tree.
RecentEventsCondenser recent_events Every step Keeps first keep_first + most recent max_events events. Returns truncated View.
AmortizedForgettingCondenser amortized_forgetting len(view) > max_size or condensation request Forgets middle events (keeps head + tail = max_size // 2). No summary.
ConversationWindowCondenser conversation_window Only on unhandled_condensation_request Keeps essential initial events (system msg, first user msg, recall obs) + most recent ~half. No summary.

LLM-based

Condenser Config type Trigger Prompt style
LLMSummarizingCondenser llm len(view) > max_size or condensation request Free-form text rolling summary
StructuredSummaryCondenser structured_summary len(view) > max_size or condensation request Forced structured output via function calling (17-field StateSummary)
LLMAttentionCondenser llm_attention len(view) > max_size or condensation request Ranks events by importance, keeps top-N (no summary text)

Pipeline

CondenserPipeline (type = "pipeline") chains multiple condensers. Each step receives the output view of the previous. Stops at first Condensation. Useful for combining e.g. BrowserOutputCondenser + LLMSummarizingCondenser.


Default Condenser

openhands/core/config/agent_config.py, line 63

condenser: CondenserConfig = Field(
    default_factory=lambda: ConversationWindowCondenserConfig()
)

ConversationWindowCondenser — no LLM, only fires on unhandled_condensation_request. This means by default, condensation only happens when:

  1. The agent itself calls request_condensation tool, OR
  2. The agent is overloaded enough that the model emits a condensation request

Without any explicit user/agent condensation request, no compression happens under the default config.


Threshold

For RollingCondenser subclasses (LLM-based + AmortizedForgettingCondenser)

def should_condense(self, view: View) -> bool:
    return len(view) > self.max_size or view.unhandled_condensation_request
  • max_size default: 100 events (configurable per condenser)
  • keep_first default: 1 event (system message / first message always kept)
  • After condensation, history is trimmed to max_size // 2 (= 50 events by default)

For ConversationWindowCondenser (default)

def should_condense(self, view: View) -> bool:
    return view.unhandled_condensation_request

Only fires on agent-triggered requests. No event-count limit.

For masking/window condensers (ObservationMaskingCondenser, BrowserOutputCondenser, RecentEventsCondenser)

Fire on every step — they modify the view inline, not via CondensationAction.

Important: None of the condensers check token counts directly. Thresholds are event counts. Token budget management is not built into the condenser system — it relies on the LLM's context window pressure and the agent's self-triggered request_condensation.


Agent-Triggered Condensation

The CodeActAgent includes a request_condensation tool in its tool list. The agent model can call this when it feels the context is getting unwieldy:

{
  "name": "request_condensation",
  "description": "Request a condensation of the conversation history when the context becomes too long..."
}

Calling this tool produces a CondensationRequestAction event. View.from_events detects this and sets view.unhandled_condensation_request = True. The condenser then fires on the next step.

This is the primary trigger for the default ConversationWindowCondenser. It's also an additional trigger for all RollingCondenser subclasses (which would fire anyway at max_size, but self-request can happen earlier).


LLM Rolling Summary Mechanism

For LLMSummarizingCondenser and StructuredSummaryCondenser:

HEAD (keep_first events)
  ↓
[AgentCondensationObservation with rolling summary]  ← inserted by View.from_events
  ↓
TAIL (target_size - keep_first - 1 events from end)

On each condensation:

  1. view[keep_first] is checked: if it's already a AgentCondensationObservation, it becomes the <PREVIOUS SUMMARY> in the new prompt
  2. forgotten_events = view[keep_first : -events_from_tail] — the middle chunk being dropped
  3. LLM summarises forgotten_events with awareness of the previous summary
  4. New CondensationAction with summary=<new_text> and summary_offset=keep_first is persisted
  5. Old CondensationAction (and its summary) are superseded — only the most recent summary is used by View.from_events

This is a rolling/incremental summary: each condensation builds on the previous summary, not on the full history.


Event Truncation vs. Content Masking

Two different approaches exist in parallel:

Event erasure (via CondensationAction): entire events are removed from the view. Used by rolling condensers and AmortizedForgettingCondenser.

Content replacement (in-place View mutation): event objects are replaced with placeholder AgentCondensationObservation objects. Used by:

  • ObservationMaskingCondenser: <MASKED>
  • BrowserOutputCondenser: "Visited URL {url}\nContent omitted"

Content replacement happens every step (not recorded in event store). Event erasure is persistent.


Configuration

Via config.toml:

[condenser]
type = "llm"          # or: noop, observation_masking, browser_output, recent_events,
                      #     amortized_forgetting, conversation_window,
                      #     structured_summary, llm_attention, pipeline
max_size = 100        # max events before condensation
keep_first = 1        # events always kept at head
max_event_length = 10000  # chars per event before truncation in prompt
llm_config = "condenser_llm"  # name of [llm.condenser_llm] section

# Example: use a cheaper model for condensation
[llm.condenser_llm]
model = "gpt-4o-mini"
api_key = "..."

For pipeline:

[condenser]
type = "pipeline"

[[condenser.condensers]]
type = "browser_output"
attention_window = 1

[[condenser.condensers]]
type = "llm"
max_size = 80

Edge Cases

Scenario Handling
No condensation ever requested (default config) ConversationWindowCondenser.should_condense() returns False — no condensation occurs
Multiple condensations Rolling summary: each CondensationAction supersedes the previous. View.from_events uses only the last summary.
LLMSummarizingCondenser with no prior summary summary_event.message = "No events summarized" — fed as empty <PREVIOUS SUMMARY>
StructuredSummaryCondenser tool call parse failure Falls back to empty StateSummary() with warning
LLMSummarizingCondenser — prompt caching Disabled explicitly: llm_config.caching_prompt = False (summary changes each time, caching wastes write credits)
Event reordering/gaps forgotten is a set; View.from_events checks event.id not in forgotten_event_ids — handles sparse event IDs
keep_first >= max_size // 2 Raises ValueError at init

Key Files

File Purpose
openhands/memory/condenser/condenser.py Abstract Condenser, RollingCondenser, Condensation, View types, registry
openhands/memory/view.py View.from_events() — reconstructs filtered view from event store
openhands/memory/condenser/impl/llm_summarizing_condenser.py Text-based rolling LLM summary
openhands/memory/condenser/impl/structured_summary_condenser.py Function-call structured summary (17 fields)
openhands/memory/condenser/impl/llm_attention_condenser.py LLM importance ranking (no summary)
openhands/memory/condenser/impl/conversation_window_condenser.py Default: window drop, no LLM
openhands/memory/condenser/impl/amortized_forgetting_condenser.py Drop middle, no LLM
openhands/memory/condenser/impl/observation_masking_condenser.py Mask old observations in-place
openhands/memory/condenser/impl/browser_output_condenser.py Mask old browser screenshots in-place
openhands/memory/condenser/impl/pipeline.py Chain multiple condensers
openhands/core/config/condenser_config.py All config types, condenser_config_from_toml_section()
openhands/core/config/agent_config.py Default condenser = ConversationWindowCondenserConfig
openhands/agenthub/codeact_agent/codeact_agent.py Agent loop — calls condensed_history(), handles Condensation return
openhands/events/action/agent.py CondensationAction, CondensationRequestAction event types

OpenHands Compression Prompts

Repo: https://github.com/All-Hands-AI/OpenHands
Commit: bf769d1
Key files:

  • openhands/memory/condenser/ — condenser plugin system
  • openhands/memory/condenser/impl/llm_summarizing_condenser.py — text-based LLM summary prompt
  • openhands/memory/condenser/impl/structured_summary_condenser.py — structured function-call prompt
  • openhands/memory/condenser/impl/llm_attention_condenser.py — importance-ranking prompt
  • openhands/core/config/condenser_config.py — condenser configs + defaults

Overview

OpenHands has three different LLM-based condensers, each with its own prompt. Plus five non-LLM strategies. All are configured via config.toml; the default (no config) is ConversationWindowCondenser which uses no LLM.


Prompt 1: LLMSummarizingCondenser

openhands/memory/condenser/impl/llm_summarizing_condenser.py

Config: type = "llm" in [condenser] section. Threshold: max_size events (default: 100).

The prompt is assembled inline in get_condensation():

You are maintaining a context-aware state summary for an interactive agent.
You will be given a list of events corresponding to actions taken by the agent, and the most recent previous summary if one exists.
If the events being summarized contain ANY task-tracking, you MUST include a TASK_TRACKING section to maintain continuity.
When referencing tasks make sure to preserve exact task IDs and statuses.

Track:

USER_CONTEXT: (Preserve essential user requirements, goals, and clarifications in concise form)

TASK_TRACKING: {Active tasks, their IDs and statuses - PRESERVE TASK IDs}

COMPLETED: (Tasks completed so far, with brief results)
PENDING: (Tasks that still need to be done)
CURRENT_STATE: (Current variables, data structures, or relevant state)

For code-specific tasks, also include:
CODE_STATE: {File paths, function signatures, data structures}
TESTS: {Failing cases, error messages, outputs}
CHANGES: {Code edits, variable updates}
DEPS: {Dependencies, imports, external calls}
VERSION_CONTROL_STATUS: {Repository state, current branch, PR status, commit history}

PRIORITIZE:
1. Adapt tracking format to match the actual task type
2. Capture key user requirements and goals
3. Distinguish between completed and pending tasks
4. Keep all sections concise and relevant

SKIP: Tracking irrelevant details for the current task type

Example formats:

For code tasks:
USER_CONTEXT: Fix FITS card float representation issue
COMPLETED: Modified mod_float() in card.py, all tests passing
PENDING: Create PR, update documentation
CODE_STATE: mod_float() in card.py updated
TESTS: test_format() passed
CHANGES: str(val) replaces f"{val:.16G}"
DEPS: None modified
VERSION_CONTROL_STATUS: Branch: fix-float-precision, Latest commit: a1b2c3d

For other tasks:
USER_CONTEXT: Write 20 haikus based on coin flip results
COMPLETED: 15 haikus written for results [T,H,T,H,T,H,T,T,H,T,H,T,H,T,H]
PENDING: 5 more haikus needed
CURRENT_STATE: Last flip: Heads, Haiku count: 15/20

<PREVIOUS SUMMARY>
{previous_summary_content}
</PREVIOUS SUMMARY>

<EVENT id={id}>
{event_str}
</EVENT>
...

Now summarize the events using the rules above.

Output format: Free-form key-value text with section headers (USER_CONTEXT, COMPLETED, PENDING, etc.). The output becomes the content of an AgentCondensationObservation injected at position 1 in the event view.


Prompt 2: StructuredSummaryCondenser

openhands/memory/condenser/impl/structured_summary_condenser.py

Config: type = "structured_summary". Requires function-calling support. Threshold: max_size events (default: 100).

System prompt (assembled inline):

You are maintaining a context-aware state summary for an interactive software agent. This summary is critical because it:
1. Preserves essential context when conversation history grows too large
2. Prevents lost work when the session length exceeds token limits
3. Helps maintain continuity across multiple interactions

You will be given:
- A list of events (actions taken by the agent)
- The most recent previous summary (if one exists)

Capture all relevant information, especially:
- User requirements that were explicitly stated
- Work that has been completed
- Tasks that remain pending
- Current state of code, variables, and data structures
- The status of any version control operations

<PREVIOUS SUMMARY>
{previous_summary_content}
</PREVIOUS SUMMARY>

<EVENT id={id}>
{event_str}
</EVENT>
...

Tool definition (forces structured output via function calling):

{
  "type": "function",
  "function": {
    "name": "create_state_summary",
    "description": "Creates a comprehensive summary of the current state of the interaction to preserve context when history grows too large. You must include non-empty values for user_context, completed_tasks, and pending_tasks.",
    "parameters": {
      "type": "object",
      "properties": {
        "user_context":        { "type": "string", "description": "Essential user requirements, goals, and clarifications in concise form." },
        "completed_tasks":     { "type": "string", "description": "List of tasks completed so far with brief results." },
        "pending_tasks":       { "type": "string", "description": "List of tasks that still need to be done." },
        "current_state":       { "type": "string", "description": "Current variables, data structures, or other relevant state information." },
        "files_modified":      { "type": "string", "description": "List of files that have been created or modified." },
        "function_changes":    { "type": "string", "description": "List of functions that have been created or modified." },
        "data_structures":     { "type": "string", "description": "List of key data structures in use or modified." },
        "tests_written":       { "type": "string", "description": "Whether tests have been written for the changes. True, false, or unknown." },
        "tests_passing":       { "type": "string", "description": "Whether all tests are currently passing. True, false, or unknown." },
        "failing_tests":       { "type": "string", "description": "List of names or descriptions of any failing tests." },
        "error_messages":      { "type": "string", "description": "List of key error messages encountered." },
        "branch_created":      { "type": "string", "description": "Whether a branch has been created for this work. True, false, or unknown." },
        "branch_name":         { "type": "string", "description": "Name of the current working branch if known." },
        "commits_made":        { "type": "string", "description": "Whether any commits have been made. True, false, or unknown." },
        "pr_created":          { "type": "string", "description": "Whether a pull request has been created. True, false, or unknown." },
        "pr_status":           { "type": "string", "description": "Status of any pull request: 'draft', 'open', 'merged', 'closed', or 'unknown'." },
        "dependencies":        { "type": "string", "description": "List of dependencies or imports that have been added or modified." },
        "other_relevant_context": { "type": "string", "description": "Any other important information that doesn't fit into the categories above." }
      },
      "required": ["user_context", "completed_tasks", "pending_tasks"]
    }
  }
}

Output format: The parsed StateSummary object is rendered as markdown:

# State Summary

## Core Information

**User Context**: ...
**Completed Tasks**: ...
**Pending Tasks**: ...
**Current State**: ...

## Code Changes

**Files Modified**: ...
**Function Changes**: ...
**Data Structures**: ...
**Dependencies**: ...

## Testing Status

**Tests Written**: ...
**Tests Passing**: ...
**Failing Tests**: ...
**Error Messages**: ...

## Version Control

**Branch Created**: ...
**Branch Name**: ...
**Commits Made**: ...
**PR Created**: ...
**PR Status**: ...

## Additional Context

**Other Relevant Context**: ...

Prompt 3: LLMAttentionCondenser

openhands/memory/condenser/impl/llm_attention_condenser.py

Config: type = "llm_attention". Requires response_schema support. Threshold: max_size events (default: 100).

Unlike the other two, this condenser does not produce a summary. Instead, it asks the LLM to rank events by importance and keeps the top-ranked ones.

You will be given a list of actions, observations, and thoughts from a coding agent.
Each item in the list has an identifier. Please sort the identifiers in order of how important the
contents of the item are for the next step of the coding agent's task, from most important to least
important.

Then each event is passed as a separate user message:

<ID>{event_id}</ID>
<CONTENT>{event_content}</CONTENT>

Structured output schema:

{
  "type": "json_schema",
  "json_schema": {
    "schema": {
      "properties": { "ids": { "items": { "type": "integer" }, "type": "array" } },
      "required": ["ids"],
      "type": "object"
    }
  }
}

The LLM returns an ordered list of event IDs (most → least important). The condenser keeps the first target_size // 2 IDs from the ranked list (plus the fixed keep_first head events).


Agent-Triggered Condensation

The agent model itself can request condensation by calling the request_condensation tool:

CondensationRequestTool = {
  "type": "function",
  "function": {
    "name": "request_condensation",
    "description": "Request a condensation of the conversation history when the context becomes too long or when you need to focus on the most relevant information.",
    "parameters": { "type": "object", "properties": {}, "required": [] }
  }
}

This produces a CondensationRequestAction in the event store. The next time the condenser runs, view.unhandled_condensation_request = True, which triggers ConversationWindowCondenser (and all RollingCondenser subclasses via should_condense).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment