Skip to content

Instantly share code, notes, and snippets.

@VictorTaelin
Last active April 26, 2024 17:31
Show Gist options
  • Star 141 You must be signed in to star a gist
  • Fork 5 You must be signed in to fork a gist
  • Save VictorTaelin/d293328f75291b23e203e9d9db9bd136 to your computer and use it in GitHub Desktop.
Save VictorTaelin/d293328f75291b23e203e9d9db9bd136 to your computer and use it in GitHub Desktop.
Notes on the GPT-4 abbreviations tweet

Notes on this tweet.

  • The screenshots were taken on different sessions.

  • The entire sessions are included on the screenshots.

  • I lost the original prompts, so I had to reconstruct them, and still managed to reproduce.

  • The "compressed" version is actually longer! Emojis and abbreviations use more tokens than common words.

  • In practice, plain, well-designed summaries should be optimal to fit larger documents in the context.

  • This concept has potential, though; building lookup tables seems to outperform long text summarization.

  • No idea why this is still going, you're too impressionable! But glad to entertain you I guess. ¯\_(ツ)_/¯

Prompts to reproduce.

Compress:

Compress the following text in a way that fits a Tweet, and such that you (GPT-4) can reconstruct it as close as possible to the original. This is for yourself. Do not make it human readable. Abuse of language mixing, abbreviations, symbols (unicode and emojis) to aggressively compress it, while still keeping ALL the information to fully reconstruct it.

## Text to compress:

Lambda calculus may be untyped or typed. In typed lambda calculus, functions can be applied only if they are capable of accepting the given input's "type" of data. Typed lambda calculi are weaker than the untyped lambda calculus, which is the primary subject of this article, in the sense that typed lambda calculi can express less than the untyped calculus can. On the other hand, typed lambda calculi allow more things to be proven. For example, in the simply typed lambda calculus it is a theorem that every evaluation strategy terminates for every simply typed lambda-term, whereas evaluation of untyped lambda- terms need not terminate. One reason there are many different typed lambda calculi has been the desire to do more (of what the untyped calculus can do) without giving up on being able to prove strong theorems about the calculus.

Decompress:

I asked you to compress a long text using your own abbreviations. You replied with:

LmbC_untyped/typed.TypLmbC:funcs_appl✅if
input"type"👌.Typ<untyp(🎯)=exprss🔽.TypLmbC:prov➕.SmplTypLmbC:⏹️eval_strat_term
.💡untypLmb-term:may_not_term.🔍variousTypLmbC:do➕(untyp),💪thrm_provs.

Reconstruct the original text.

Settings:

Model             : GPT-4
Temperature       : 0.7
Top P             : 1
Frequency penalty : 0
Presence penalty  : 0

Stop

Seriously everyone, this prompt is silly, stop hyping it. This is just GPT trying to make abbreviations, there is no alien language or AI ghost trying to scape here. If you want to play with this, try improving the prompt. Remove the Tweet limit, ask it to abbreviate all words, give it some examples of how you want it to be done. One way this kind of thing can be useful is to pre-define 1-token synonyms (lookup tables) when building interactive prompts. This way you can save a lot of space on recursive interactions, which is actually desirable given that these have quadratic costs.

@IgnacioPardo
Copy link

Would be fun to run comp/decomp metrics on text batches as to show consistency.

For reference: A Dataset and Evaluation Metrics for Abstractive Compression of Sentences
and Short Paragraphs

@nasirus
Copy link

nasirus commented Apr 3, 2023

Thank you for sharing 💕

@mikolalysenko
Copy link

According to GPT4:

LmbC_untyped/typed: This is referring to Lambda Calculus (LmbC) in both its untyped and typed forms.

.TypLmbC: This is shorthand for "Typed Lambda Calculus."

funcs_appl✅if: This means "Functions can be applied if."

input"type"👌: This is saying that the input type is correct (👌).

.Typ<untyp(🎯)=exprss🔽: This denotes that the typed Lambda Calculus (Typ) is more restrictive than untyped Lambda Calculus (untyp), represented by the less-than symbol (<). The target symbol (🎯) indicates the goal or objective. The equal sign followed by the down arrow (🔽) represents that this restrictiveness leads to a reduction in the expressiveness of the language.

.TypLmbC:prov➕: This is shorthand for "Typed Lambda Calculus provides additional benefits (➕)."

SmplTypLmbC:⏹️eval_strat_term: This means "Simple Typed Lambda Calculus stops (⏹️) the evaluation strategy at a certain point (term)."

.💡untypLmb-term:may_not_term: This denotes "In untyped Lambda Calculus, the evaluation may not terminate (term)."

.🔍variousTypLmbC:do➕(untyp),💪thrm_provs: This is saying "Investigate (🔍) various Typed Lambda Calculus systems that offer more benefits (➕) than untyped ones and stronger (💪) theorem proving capabilities (thrm_provs)."

@qfeuilla
Copy link

qfeuilla commented Apr 5, 2023

If you use this instead

Compress the following text in a way such that you (GPT-4) can reconstruct it as close as possible to the original. This is for yourself. Do not make it human readable. Aggressively compress it, while still keeping ALL the information to fully reconstruct it. Also use the fewest token possible, your text should be way smaller than the one I give you

It does compress the text drastically (2 times less)

Result:

LC_ut_Typ; f_appl_if_accept; TypLC_weak; untypedLC_primary; TypLC_prove_more; STLC_theorem_term; eval_untypedLC_nterm; diff_TypLC_calc_desire; strong_theorems.

Decompression:

In the study of lambda calculus (LC), we examine various types, such as the simply-typed lambda calculus (TypLC) and untyped lambda calculus. For some applications, it is important to consider if a given lambda calculus is acceptable for the specific problem. Typically, typed lambda calculus has weaker properties compared to untyped lambda calculus. Untyped lambda calculus is primarily used in certain areas due to its flexibility. However, typed lambda calculus often provides more powerful proof techniques and mathematical rigor. The simply-typed lambda calculus (STLC) includes theorems and terms that help to analyze and understand the properties of the system. The evaluation of untyped lambda calculus relies on normal-order terms (nterms). To differentiate between various typed lambda calculi, we often calculate and compare their desired properties, such as expressiveness and type safety. Finally, strong theorems can be derived from well-defined lambda calculi to prove critical properties and results in the field of programming languages and type theory.

@MikeStirner
Copy link

Wow, so as far as token lengths:
181 for original
119 for the tweeter's compression
70 for the comment above

@thiswillbeyourgithub
Copy link

Two things:

First, what happens if you ask it to encode step by step? That could shed some lights on how it's done.

Second, isn't it possible to use this kind of feat to compile code? I'm sure part of the future for computer science is to have compilers be made by AIs that optimize the s*** out code. Next step is to have CPU gates optimized by AIs to take advantage of this kind of language and compiled code.

What do you think?

@MikeStirner
Copy link

Regarding the optimization of CPU gates, AI and machine learning algorithms have been used in designing more efficient hardware. AI can potentially optimize the hardware architecture to better suit the generated code or specific workloads, leading to improved performance and energy efficiency. This process can be iterative, with AI models learning and adapting the hardware design based on feedback from real-world use and testing.

@jimsrc
Copy link

jimsrc commented Apr 8, 2023

I just digged a bit more to generalize the compression technique. See my gist.

@zachary-kaelan
Copy link

@MikeStirner

Wow, so as far as token lengths: 181 for original 119 for the tweeter's compression 70 for the comment above

I tried the original compression prompt with GPT-3.5 and got this, which is also 70 tokens:

Lambda calculus: typed=apply if capable of given "type" data. Typed weaker than untyped, can express less. Typed allows more proven things, e.g. simply typed=theorem-every eval strat terminates, unlike untyped. Many typed lambda calc due to wanting more without giving up strong theorem

Decompresses to this:

Lambda calculus comes in two forms: typed and untyped. In typed lambda calculus, functions can only be applied if they are capable of processing a given "type" of data. This makes typed lambda calculus weaker than untyped lambda calculus, as it can express less. However, the advantage of typed lambda calculus is that it allows for more proven theorems. For example, the simply typed lambda calculus proves that every evaluation strategy terminates, unlike untyped lambda calculus. Many variations of typed lambda calculus exist because people want more expressive power without sacrificing strong theorems.

GPT-4 seems to be far better at reducing the character count, but full words are still 1-3 tokens so it is completely pointless to replace them with abbreviations and emojis.

Example: Lambda calculus: typed and LmbC_typed are both 6 tokens. One just looks more compressed.

A more ridiculous example: GPT-4 changed express to exprss, increasing it from 1 token to 2.

@MM-IR
Copy link

MM-IR commented Apr 29, 2023

Thank you for sharing!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment