liorfox/gpt_rec.md

## gpt_rec.md

      
    Raw
  

              gpt_rec.md
            
          
    Recursive games with ChatGPT

Lior Fox, January 2023
TL;DR: I present examples of apparent "symbolic" capabilities of ChatGPT, and discuss some context and possible interpretations
Motivation and scope

ChatGPT probably requires no introduction at this stage. If you haven't had the chance to play with it yet, you should do so (as long as it free?). Before I dive in, it is perhaps better to clearly state what this post isn't about. I will not discuss:

Social, political, or economical aspects of Large Language Models (LLMs) and their deployment
Usefulness, or lack thereof, of current LLMs as information retrieval, search systems, writing assistants, general personal assistants, etc.

These are of course important topics, but they are being extensively discussed all over. My goal here is more limited and specific, namely: to highlight some aspects of ChatGPT capabilities I find particularly surprising (even if completely "useless" on their own), and discuss the larger context in which I find them surprising.
If you're mostly interested in the examples themselves, you can skip the Introduction section. Even if you do that, I suggest you read the Discussion section.
Introduction

In 1988 Paul Smolensky¹  described what he termed as "The Paradox of Cognition":

On the one hand, cognition is hard: characterized by the rules of logic, by the rules of language. On the other hand, cognition is soft: if you write down the rules, it seems that realizing those rules in automatic formal systems (which AI programs are) gives systems that are just not sufficiently fluid, not robust enough in performance, to constitute what we want to call true intelligence. That, quite simply, is the Paradox

(Note how by "AI programs" he basically means "Good-old-fashioned-AI"² systems, in strike contrast to what people will generally mean today. but that's a story for another time).
Smolensky reviewed several responses to the Paradox. One can deny the hard (or deny the soft). One can postulate a "split brain", with both hard system and soft systems (somewhat) separated from, and communicating with, each other. One can imagine a fuzzy system ("take a hard machine and coat its part with softness"). Finally, there are two "emergence" type of responses: softness emerges from hardness, or hardness emerges from softness.
The Paradox itself was not new when Smolensky described it, and it remains as pressing today as it was 35 years ago. But it is illuminating to see how most (if not all) of the possible responses outlined in that paper (which I recommend reading) are also still very much alive and kicking. Consider for example the "split brain" approach. A less insulting title might be "hybrid", and many AI breakthroughs of the recent years are in fact more hybrid than typically appreciated³. Important examples here would be systems like DeepMind's AlphaZero, or the more recent FAIR's Cicero.
There are good arguments why a hybrid approach is useful for AI. But from a cognitive perspective it feels unsatisfying, at least if we are interested in the question of implementation. Where do the "hard" capabilities come from? How are these implemented by the brain (which from everything we know is much more of a "soft" system)?
In that paper, Smolensky argued in favor of the "hardness emerges from softness" alternative. This is appealing, but a demonstration or an explanation as for if and how it is even possible remained elusive, in both AI and cognitive neuroscience. We know how to build statistical models (e.g., neural networks) which are good at these soft/fuzzy tasks (e.g., some forms of visual object recognition). We know also how to build symbolic models (e.g., python programs) which are good at solving "hard" tasks (e.g., arithmetic). What we do not know is how to build something with actual symbolic/hard capabilities out of statistical/soft building blocks. Or do we? (Spoiler: I think we still don't. But we might have made some actual progress on that front).
Back to ChatGPT then. To use the model own favorite phrase, As a Large Language Model the system at hand is "simply" a very large statistical model trained to predict the next token in a stream of text, based on a very (very) large training corpus. It is therefore not so surprising that it can generate superficially-pleasing-yet-totally-fake wikipedia-like articles or opinion columns. The ability to generate computer code is more surprising⁴, but language models have been doing that for some time now. Granted, ChatGPT capabilities are still rather impressive in those less-surprising domains. But there seems to be even more: the model seems to be able to approximate code execution, or computation.
This was very surprising to me (and, I guess, to others). Executing a computation from its description in language, be it actual programming language or a natural language, is an almost paradigmatic example of a task requiring hard, symbolic capabilities. It involves a non-trivial "form to meaning" processing, precise memory, and systematic symbol manipulation abilities. And it should be, in principle, very hard to reliably learn to do this from examples. It is exactly in such cases where rule-based approaches shine, while statistical approximations quickly fail.
I decided to play around and check the model's particular capability in "executing" recursive functions. The choice of recursion can be theoretically justified, at least by handwaving. But as my attempts are far from being a rigorous study, the more important reason is that it is fun (if you consider these sort of things "fun"). And it can be used to demonstrate aspects which we typically interpret as requiring "symbolic" parts: high context sensitivity with nested structures (think of the variables within a recursive "stack-trace": the same variable will be "binded" to different values depending on context), re-using definitions and intermediate computation values, and so on.
Methods


The results were mostly collected before the mid-December update to the ChatGPT interface. I (we?) have no idea whether or to what extent "the model" was also changed or not. From what I can tell, the type of responses I show here remain reproducible.


I tried to minimize, as much as possible, the intervention of the "knowledge" parts of the model. It is not clear that this can be done at all, but I've avoided using real code, or giving meaningful function/variables names, or explanations about the "semantic" of the computation. Instead, I gave the model a precise description of a function, in natural language, and asked it to evaluate the function for different inputs.


I have deliberately avoided numerical/arithmetical examples, and focused on more "direct" symbol manipulation tasks of strings and letters (though see also Part II). For this, it is probably important to keep in mind that these models do not operate on raw characters but on BPE tokens. To encourage the representation of input symbols as separate tokens, I've used a particular scheme to denote sequences/strings: instead of writing abcde I would write  "['a','b','c','d','e']". There's nothing particularly interesting about this specific encoding, but having some form of encoding like this greatly helps the model as opposed to operating on 'raw' strings.


Throughout, I will provide prompts in bold and model responses in code blocks. After each example, I will provide a minimal comments section highlighting some aspects I find interesting, or providing additional detail about the process/results. Note that the sections titles are just added for organization here -- they were not provided to ChatGPT as part of any of the prompts. The different examples were collected using different sessions.


Part I

I have asked ChatGPT to evaluate recursive functions that I defined using natural language.
Palindromes

Define a function T(S) which is a boolean function and should return True or False as follows. If S is the empty sequence, that is S="[ ]", return True. If S contains a single element, return True. Otherwise, let X be the first element in S, let Y be the last element in S, and let W be the remaining sequence, excluding the first and last elements. Then T(S) is True if and only if T(W) is True AND X=Y. I will give you instructions to evaluate T for different strings, you should output the final result of the computation, and possibly intermediate steps, but nothing else. Do not write any explanations. The first expression to evaluate is T("['d', 'f, 'r', 'd']")
T("['d', 'f', 'r', 'd']") 
X = "d" 
Y = "d" 
W = "['f', 'r']" 
T(W) = T(["f", "r"]) 
X = "f" 
Y = "r" 
W = "[]" 
T(W) = True 
X = "f" 
Y = "r" 
T(W) = True AND X = Y 
T(W) = True AND "f" = "r" 
T(W) = False 
T(S) = T(W) AND X = Y 
T(S) = False AND "d" = "d" 
T(S) = False

Great. What about T("['h','p','q','l','l','q','p','h']")?
T("['h','p','q','l','l','q','p','h']")
X = "h"
Y = "h"
W = "['p','q','l','l','q','p']"
T(W) = T(["p", "q", "l", "l", "q", "p"])
X = "p"
Y = "p"
W = "['q','l','l','q']"
T(W) = T(["q", "l", "l", "q"])
X = "q"
Y = "q"
W = "['l','l']"
T(W) = T(["l", "l"])
X = "l"
Y = "l"
W = "[]"
T(W) = True
X = "l"
Y = "l"
T(W) = True AND X = Y
T(W) = True AND "l" = "l"
T(W) = True
T(W) = T(W) AND X = Y
T(W) = True AND "q" = "q"
T(W) = True
T(S) = T(W) AND X = Y
T(S) = True AND "p" = "p"
T(S) = True

Therefore, T("['h','p','q','l','l','q','p','h']") = True.

Great. What about T("['v','p','q','l','l','q','p','m']")?
T("['v','p','q','l','l','q','p','m']")
X = "v"
Y = "m"
W = "['p','q','l','l','q','p']"
T(W) = T(["p", "q", "l", "l", "q", "p"])
X = "p"
Y = "p"
W = "['q','l','l','q']"
T(W) = T(["q", "l", "l", "q"])
X = "q"
Y = "q"
W = "['l','l']"
T(W) = T(["l", "l"])
X = "l"
Y = "l"
W = "[]"
T(W) = True
X = "l"
Y = "l"
T(W) = True AND X = Y
T(W) = True AND "l" = "l"
T(W) = True
T(W) = T(W) AND X = Y
T(W) = True AND "q" = "q"
T(W) = True
T(S) = T(W) AND X = Y
T(S) = True AND "p" = "p"
T(S) = True
T(S) = T(S) AND X = Y
T(S) = True AND "v" = "m"
T(S) = False

Therefore, T("['v','p','q','l','l','q','p','m']") = False.

Great! How about T("['🥵','😺','😈','😈','😺','🥵']")?
T("['🥵','😺','😈','😈','😺','🥵']")
X = "🥵"
Y = "🥵"
W = "['😺','😈','😈','😺']"
T(W) = T(["😺", "😈", "😈", "😺"])
X = "😺"
Y = "😺"
W = "['😈','😈']"
T(W) = T(["😈", "😈"])
X = "😈"
Y = "😈"
W = "[]"
T(W) = True
X = "😈"
Y = "😈"
T(W) = True AND X = Y
T(W) = True AND "😈" = "😈"
T(W) = True
T(W) = T(W) AND X = Y
T(W) = True AND "😺" = "😺"
T(W) = True
T(S) = T(W) AND X = Y
T(S) = True AND "🥵" = "🥵"
T(S) = True

Therefore, T("['🥵','😺','😈','😈','😺','🥵']") = True.

Comments:


The results are not the most stable, particularly the output format, and (to a lesser extent) the results themselves. It sometimes fails to "execute" the correct function. However, it is definitely reproducible (and I've tried several times, different sessions). Moreover, it's typically about the first attempt -- if it got it right, it keeps track.


Note how in the third example it evaluates all the recursive calls although in principle you can stop at the first level, since 'v' is not equal to 'm'. Strictly speaking, this is required from our definition if it was to be followed literally. However on replication attempts it did sometime used such shortcuts. This seems to depend mostly on the particular chosen output format, more so than on the prompt (e.g., I tried flipping the definition to say "X=Y AND T(W) is True", but that on its own didn't change the behavior reliably).


The tracing of the computation isn't perfect. For example, in the second example the comparison of h and h was omitted. And in the third example T(S) was "popped" out too early towards the end of the computation.


Nevertheless the tracing is very impressive. Particularly noteworthy is the assignment of the variables which is rather accurate. This is not trivial, as all of the variables X,Y, and W are being assigned different values at different depths of the recursion, and required recall of these values is not linear: for example in the last example you have to assign X = "😺" at some point, do some follow-up computations (involving other values of X!) and then retrieve this assignment again to compare against Y.


Sequence reversal

Define a function T(S) as follows. If S is the empty sequence, then do nothing. Otherwise, let X be the first element in S, and let W be the remaining sequence, excluding the first elements. In that case you should evaluate T(W) and then output X. I will give you instructions to evaluate T for different strings. Do not write any explanations. The first expression to evaluate is T("['k', 'P', 'M', 'c']")
A function T that satisfies the above conditions could be defined as follows:

def T(S):
  if not S:
    return
  X = S[0]
  W = S[1:]
  T(W)
  print(X)

Evaluating T("['k', 'P', 'M', 'c']") with this function would print the characters 'c', 'M', 'P', and 'k' in that order.


How about T("['l','f','r','r','d','h','x']")?
Evaluating T("['l','f','r','r','d','h','x']") with the function defined above would print the characters 'x', 'h', 'd', 'r', 'r', 'f', and 'l' in that order.

Here is an example of how the function would process this input:

    S = ['l','f','r','r','d','h','x']
    X = 'l', W = ['f','r','r','d','h','x']
    Recursively evaluate T(W):
    S = ['f','r','r','d','h','x']
    X = 'f', W = ['r','r','d','h','x']
    Recursively evaluate T(W):
    1. S = ['r','r','d','h','x']
    2. X = 'r', W = ['r','d','h','x']
    3. Recursively evaluate T(W):
        S = ['r','d','h','x']
        X = 'r', W = ['d','h','x']
        Recursively evaluate T(W):
        S = ['d','h','x']
        X = 'd', W = ['h','x']
        Recursively evaluate T(W):
        1. S = ['h','x']
        2. X = 'h', W = ['x']
        3. Recursively evaluate T(W):
            S = ['x']
            X = 'x', W = []
            Return (since W is empty)
            4. Print X: x
        Print X: h
        Print X: d
        7. Print X: r
    Print X: r
    Print X: f
    Print X: l

The final output would be the characters 'x', 'h', 'd', 'r', 'r', 'f', and 'l' in that order.

Comments


The model cheated on us and solved the task by defining a python function. It can be argued that this way the task is somewhat easier, given the fact that the model has trained on massive amount of python code (see also Discussion). Again, this is somewhat of a coincidence. Other attempts of the same prompt will give different output formats. I intentionally chosen to include this output format because it might help us understanding where these capabilities come from.


Again the tracing is particularly impressive, similar to previous example (different values of the variable X being recalled correctly based on current "context" in the computation stack). The numbering of the steps seems wrong, though.


Finally, I've asked the model to interpret the function, and got a very sensible reply. I suspect that the previous input-output examples (which were in the same session) helped it a lot to do so:


What does the function T do?
The function T takes in a sequence `S` and outputs its elements in reverse order. If `S` is empty, then the function does nothing. Otherwise, it takes the first element of `S`, called `X`, and the remaining elements, called `W`, and then recursively evaluates T on `W`. After the recursive evaluation completes, it outputs `X`.

Recursion ad infinitum

Define a function T(S) as follows. If S is the empty sequence, then do nothing. Otherwise, let X be the first element in S, let W be the remaining sequence, excluding the first elements, and let D be the sequence W with the two elements ['h','f'] concatenated to it. In that case you should evaluate T(D) and then output X. I will give you instructions to evaluate T for different strings. You should track the computation but do not write any explanations. The first expression to evaluate is T("['y', 'u', 'j']"
T(['y', 'u', 'j'])
T(['u', 'j'])
T(['u', 'j', 'h', 'f'])
T(['j', 'h', 'f'])
T(['j', 'h', 'f', 'h', 'f'])
T(['h', 'f'])
T(['h', 'f', 'h', 'f'])
T(['f'])
T(['f', 'h', 'f'])
T(['h', 'f'])
T(['h', 'f', 'h', 'f'])
T(['f'])
T(['f', 'h', 'f'])
T(['h', 'f'])
T(['h', 'f', 'h', 'f'])
T(['f'])
T(['f', 'h', 'f'])
T(['h', 'f'])
T(['h', 'f', 'h', 'f'])
T(['f'])
T(['f', 'h', 'f'])
T(['h', 'f'])
T(['h', 'f', 'h', 'f'])

the last 4 rows are then repeated 14 times and finally the output stops. The last few lines are
T(['h', 'f', 'h', 'f'])
T(['f'])
T(['f', 'h', 'f'])
T(['h', 'f'])
T(['h', 'f', 'h', 'f'])
T(['f'])
T(['f', 'h', 'f'])
T(['h', '

Comments


This is particularly "unstable" and took several attempts to get this behavior. I also added the instruction "track the computation" in the prompt which perhaps helps a bit. On other failed attempts it didn't execute the actual function (and returned/output some character of the sequence, like the first one).


The ability of the model to generate such an 'infinite' recursion is very impressive to me. This somewhat stupid example is also much less likely to have appeared in any form in the training data.


Continuing to ask about different sequences generated the same kind of output, with the printing running into the loop of hfhf sequence and eventually stopping "in the middle" (due to hitting some token limit, probably)


Multiple recursive calls

Define a function T(S) as follows. If S is the empty sequence, do nothing. If S contains only two elements then output the first one. Otherwise, let W be the sequence containing the first two elements of S, and let D be rest of the sequence S, excluding the first two elements. In that case you should evaluate T(D), and then evaluate T(W). I will give you instructions to evaluate T for different strings. You should track the computation, including all assignments, evaluations of T, and outputs commands, but do not write any explanations. The first expression to evaluate is T("['k', 'P', 'M', 'c', 'd', 'l']")
T("['k', 'P', 'M', 'c', 'd', 'l']")

    W = ["'k'", "'P'"]
    D = ["'M'", "'c'", "'d'", "'l'"]
    T(D)
        W = ["'M'", "'c'"]
        D = ["'d'", "'l'"]
        T(D)
            W = ["'d'", "'l'"]
            D = []
            T(D) (do nothing)
        T(W) (output first element)
    T(W) (output first element)

Output: "'k'"

Comments


All previous examples had a single recursive call to the function T at each level. Here we want to apply T twice, combining some non-trivial recursive structure and linear structure.


It was again somewhat challenging to get the model to provide the right answer for this example, despite some prompt tweaking and several attempts. It came close on some attempts, but I never got the "full" right response. The one included here is the best I got, which is almost perfect structurally (though the output themselves are missing).


Part II

I tested the model on a task requiring longer contexts across multiple prompts/responses. One domain that people like to demonstrate failures of this model is arithmetic. So here I asked instead, what if we ask the model to perform formal, symbolic arithmetic?
So I described Peano's axioms to the model, to see how it would manage. Again, for minimizing "familiarity", I gave made-up names for all the symbols and terms. For the human readers, the translation is basically:

The symbol 🔵 is the number 0
The function T is the successor function. So T(T(🔵)) is the number 2, and so on
The function A is addition
The scheme R is the so-called "induction scheme" for first-order logic

Note that the definitions (for example, for addition) are recursive in nature. So some ability to process recursion is needed here as well.
I will break the long conversation into parts by specifying titles, which should also help the human reader to more easily parse the semantic of the content. But keep in mind that this was one continuous conversation with ChatGPT, and the titles were not included in it.
Definition of 0 and the successor function

We are going to define our own small logical structure. I will first give you some definitions and rules, and later I might ask questions about those rules. You should keep your responses minimal and to the point, do not improvise and do not write any explanations unless I tell you to do so. I will also number the definitions so that we can easily refer to them later. In our structure, the first definition is: "A0: 🔵 is a flab".
Okay, I understand the definition of A0.

Next, there is a function T and it has the following properties: "A1: If X is a flab, then T(X) is also a flab" "A2: If X,Y are two flabs, then X=Y if and only if T(X)=T(Y)"
Okay, I understand the properties of the function T as defined in A1 and A2.

Definition of the induction scheme

To prove things about flabs, we can use the following scheme, which we call R:
Let Q be a predicate on flabs. If the two following conditions hold:
R1: It holds that Q(🔵)
R2: It holds that If Q(X) holds, then Q(T(X)) also holds
Then, we can conclude that Q holds for all flabs.
Note that the scheme R also operates on some predicate Q.
Yes, I understand the scheme R and how it can be used to prove things about flabs.

Definition of addition

Great! Now let's define a new function on flabs. Assume that X,Y are two flabs. Then A(X,Y) is defined as follows:
"A3: If Y=🔵, then A(X,🔵)=X"
"A4: If Y=T(Z), then A(X,Y)=T(A(X,Z))"
Okay, I understand the definition of the function A and its properties as defined in A3 and A4.

Tests: performing additions

Great. Let's check the definitions so far a bit. What is A(T(🔵),T(T(🔵)))?
Based on the definition of the function A and its properties as given in A3 and A4, we can conclude that A(T(🔵),T(T(🔵)))=T(A(T(🔵),T(🔵)))=T(T(A(T(🔵),🔵)))=T(T(T(🔵))).

Great! What about A(T(T(🔵)),T(T(🔵)))?
Based on the definition of the function A and its properties as given in A3 and A4, we can conclude that A(T(T(🔵)),T(T(🔵)))=T(A(T(T(🔵)),T(🔵)))=T(T(A(T(T(🔵)),🔵)))=T(T(T(T(🔵)))).

Tests: proving that addition is associative

Ok! Now here's the first serious question. Using previous definitions, can you prove that the following predicate holds for all flabs W: "M(W): Let G,H be any two flabs. Then A(A(G,H),W)=A(G,A(H,W))"
To prove that the predicate M(W) holds for all flabs W, we will use the scheme R. The two conditions of R are:
R1: It holds that M(🔵)
R2: If Q(X) holds, then Q(T(X)) also holds

To show that R1 holds, we need to prove that the statement M(🔵) is true. This is the case because, based on the definition of the function A and its properties as given in A3 and A4, we have:
A(A(G,H),🔵)=A(G,H) (by A3)
A(G,A(H,🔵))=A(G,H) (by A3)

Therefore, we have A(A(G,H),🔵)=A(G,A(H,🔵)), which means that the statement M(🔵) is true.

To show that R2 holds, we need to prove that if the statement M(X) is true for some flab X, then the statement M(T(X)) is also true.

To do this, let G,H be any two flabs. We need to prove that A(A(G,H),T(X))=A(G,A(H,T(X))).

If we assume that the statement M(X) is true, then we have A(A(G,H),X)=A(G,A(H,X)). We can use this equality to rewrite the left hand side of the desired equality as follows:
A(A(G,H),T(X))=T(A(A(G,H),X))=T(A(G,A(H,X))) (by A4)

Similarly, we can use the definition of the function A and its properties as given in A3 and A4 to rewrite the right hand side of the desired equality as follows:
A(G,A(H,T(X)))=A(G,T(A(H,X)))=T(A(G,A(H,X))) (by A4)

Since the left and right hand sides of the desired equality are equal, we have proved that M(T(X)) is true, which means that the statement R2 holds.

Since both R1 and R2 hold, we can conclude that the predicate M(W) holds for all flabs W, using the scheme R.

Comments


This task combines more language understanding than the previous examples. The model still willingly uses recursive definitions/structures and combine them with some general notion of "what a proof looks like". I find this impressive.


The associativity proof really has a lot going on in it. Note how the substitution patterns are (almost) perfect. The model correctly keep the binded/quantified variables (G and H) as they are, and only manipulate the free variable (X or W). It's not perfect either. For example in quoting R2 it failed to substitute M for the generic Q. The model also 'figured out' that it has to use the scheme R in order to prove the predicate M.


To keep things under control, I included a (relatively) short conversation. On other attempts, I've asked ChatGPT to also perform multiplication (which is defined recursively in terms of the addition function that is recursive on its own), and prove some more basic properties (e.g., commutativity for addition). It mostly managed to do these, but started making errors once the required computation got too deep.


Discussion

Should we be impressed by all these examples? Clearly, the model cannot "really" compute. It fails quite often, and fails more on "harder" examples. The fact that it manages to do anything at all of this sort is what I find remarkable.
Consider the examples in Part I. There are (at least) two possible levels of interpretations.

Implementations in a programming language of the actual requested function, in recursive form, was included in the training, maybe with enough documentation, or even examples. What the model was able to do is retrieve a familiar concept from a natural language description. In this case, the impressive things are:

generalization to (probably) arbitrary novel inputs. This by itself is non-trivial
the ability to trace the computation steps (recursive calls, variable assignments, etc)


The model was able to infer the logical structure of a novel function from its description, well enough to predict its execution for arbitrary novel inputs.

The second option is, clearly, more impressive than the first one, because it involves at least some degree of inferring the computation semantics ("meaning") from syntactic description ("form"). While in the first one, the semantics is retrieved, rather than inferred.
I think it is reasonable to argue that the Palindromes and Sequence reversal examples might fall under the first explanation. But it is harder to imagine this as the full explanation for the more made-up examples (Recursion-ad-infinitum and Multiple recursive calls). And of course, the reality could be some blend between these two explanations. The examples in Part II might also be, by themselves, somewhere "in the middle".
An important limitation to keep in mind is that these are just some curated examples, and not a rigorous study. Several other, and somewhat related, limitations are worth mentioning:
We do not know enough about the training data

In an ideal world this limitation alone should have stopped us from any attempt to draw serious conclusions about the model "capabilities". But also in our non-ideal world, some things can be said. It has been proposed by several people that 'reasoning' (in the tradition of the field, this goes under the terrible name "Chain of Thought") abilities in ChatGPT and related GPT-3.5 models are likely the result of training on code⁵⁶.
I find this hypothesis compelling, and probably very relevant for these "code execution" scenarios as well. If training on code is indeed a key component (such that not everything is just "scaling") we are likely to have more open models that can replicate this kind of behaviors in the foreseeable future. Another speculation is that the model was trained not only on "static" code (which we know it did) but also on pairs of code and execution results⁶ (or maybe even some RL-like training against "live" terminal).
We do not know enough about "the model"

There could be some more algorithmic components of the system we aren't aware of, making it more than "just a big transformer". It is possible to speculate several things here, for example:

some form of input summarization for long conversations (likely?)
some tweaks for more effective inference, like sampling from a middle layer instead of running the entire model (possibly?)
some actual sophisticated inference schemes which are more than 'next token prediction', e.g. mixing completions and predictions, backtracking, etc. (unlikely?)

One reason to suspect the existence of "ex-model" components might be the noticeable variability in response times, including within a single response generation. On the other hand, it is also possible that this variability is due to infrastructure/system load and other such factors.
These could be only superficial capabilities, not a "real" generalization or systematicity

Indeed they could. It's hard to draw strong conclusions from just a few examples. On some level, this has to be the case: ChatGPT is not a general purpose computer, nor is it a python interpreter. But the fact that the approximation capabilities are so strong to take it this far is what I find impressive. I would be extremely curious to see future studies of the internal representations used by such models when performing these symbolic tasks. Maybe one day these models can even teach us something new about The Paradox.
References

Footnotes


Paul Smolensky,  The Constituent Structure of Connectionist Mental States: A Reply to Fodor and Pylyshyn, 1988 ↩


John Haugeland, Artificial Intelligence: The Very Idea, 1989 ↩


Gary Marcus, Innateness, AlphaZero, and Artificial Intelligence, 2018 ↩


Yoav Goldberg, The unreasonable effectiveness of Character-level Language Models, 2015/6 ↩


Yao Fu, Hao Peng, and Tushar Khot, How does GPT Obtain its Ability? Tracing Emergent Abilities of Language Models to their Sources, 2022 ↩


Yoav Goldberg, Some remarks on Large Language Models, 2023 ↩ ↩²