Skip to content

Instantly share code, notes, and snippets.

@jimsrc
jimsrc / gpt4_text_compression.md
Last active February 21, 2024 22:16
Minimizing the number of tokens usage to interact with GPT-4.

Overview

I just read this trick for text compression, in order to save tokens in subbsequent interactions during a long conversation, or in a subsequent long text to summarize.

SHORT VERSION:

It's useful to give a mapping between common words (or phrases) in a given long text that one intends to pass later. Then pass that long text to gpt-4 but encoded with such mapping. The idea is that the encoded version contains less tokens than the original text. There are several algorithms to identify frequent words or phrases inside a given text, such as NER, TF-IDF, part-of-speech (POS) tagging, etc.

@VictorTaelin
VictorTaelin / gpt4_abbreviations.md
Last active April 26, 2024 17:31
Notes on the GPT-4 abbreviations tweet

Notes on this tweet.

  • The screenshots were taken on different sessions.

  • The entire sessions are included on the screenshots.

  • I lost the original prompts, so I had to reconstruct them, and still managed to reproduce.

  • The "compressed" version is actually longer! Emojis and abbreviations use more tokens than common words.