Skip to content

Instantly share code, notes, and snippets.

@danielrosehill
Created May 4, 2025 17:41
Show Gist options
  • Select an option

  • Save danielrosehill/dff51cce9d05d5a738b826ee46737faa to your computer and use it in GitHub Desktop.

Select an option

Save danielrosehill/dff51cce9d05d5a738b826ee46737faa to your computer and use it in GitHub Desktop.
Summary of an email exchange with Anthropic Support (Feb–Mar 2025) on optimizing prompt caching for long conversations in Claude's API.

Q&A: Optimizing Prompt Caching for Long Inputs in Anthropic's API

This is a paraphrased summary of a support response from Anthropic (received approximately February or March 2024), explaining how to optimize prompt caching when working with long or ongoing conversations using Claude.


Q: How can I optimize prompt caching when working with long prompts or ongoing conversations in Anthropic’s API?

A: Prompt caching is an effective way to optimize API performance and reduce costs by reusing previously cached prompt segments. Here's a general strategy for applying it effectively in long conversations:


How Prompt Caching Works

You can think of prompt caching as working with a series of segments:

  • [a] (write)
  • [a] (read) [b] (writes [a, b])
  • [a, b] (read) [c] (writes [a, b, c])
  • [a, b, c] (read) [d] (write)

Each step appends to the prompt history, and the cache system only stores the deltas. When implemented properly, this significantly reduces token processing for repeated prompt structures.


Key Implementation Tips

  • Stable Prefix: Use a consistent system prompt (e.g. [x]) as a shared prefix across multiple conversations. This helps unify caching logic and prevents redundant writes.

  • Recent Turns Strategy:

    • Reserve one breakpoint for system/tools prompts.
    • Use the remaining three cache points for the most recent user-assistant message pairs (i.e., three full turns).
  • Automatic Handling: The _inject_prompt_caching() helper function (from Anthropic’s SDK or example code) can manage cache segmentation automatically.


Cache Continuity

  • By keeping a consistent cache structure across calls and maintaining the last three turns, you avoid breaking the cache.
  • If older segments roll off due to cache limits, they only affect that specific segment—not the entire cached prompt.

Note: This information is based on support guidance provided in early 2024. Be sure to consult the latest Anthropic API documentation or support channels for any updates.


This gist was generated with the help of OpenAI based on a summarized support conversation with Anthropic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment