Many of the tasks we care about either expand a tree by one layer (add factors, subquestions, datasets) or summarize a tree layer into one or a few nodes (summarize, cluster, rank).
This raises two questions:
- Is there an existing way to characterize these tasks so that we can make an app that supports all and only the ones we want?
- Language models have a finite context window. How can we clearly think about tasks that operate on arbitrarily large inputs?
Technically, we can think of expansion tasks as unfolds (creating structure) and summarization-like tasks as folds (reducing structure). They're defined like this:
def fold(f, seed, lst):
"""
Reduces a list to a value (which could be another list)
by iteratively applying a function to each element of the
list and the seed value.
f: Maps list value and seed to new seed
seed: The state for the fold
lst: The list to be reduced
"""
if not lst:
return seed
else:
return fold(f, f(lst[0], seed), lst[1:])
def unfold(p, f, g, seed):
"""
Creates a list from a seed value by iteratively
applying a function to the seed.
p: Determines when to stop unfolding.
f: Maps each seed value to the corresponding list element.
g: Maps each seed value to next seed value.
seed: The state for the unfold.
"""
if p(seed):
return []
else:
return [f(seed)] + unfold(p, f, g, g(seed))
These functions allow us to specify a task that operates on an arbitrarily large data structure using only operations on one or two elements. Here's how to use unfold to make an arbitrarily large list of numbers:
def range(n):
return unfold(lambda x: x == n,
lambda x: x,
lambda x: x + 1,
0)
None of the functions passed as arguments to unfold need to know about more than one element.
And here's how to sum a list of numbers:
def sum(xs):
return fold(lambda i, j: i + j,
0,
xs)
The function passed as argument to fold only needs to know about pairs of numbers.
Language models have a finite context window and so if we want to use them to transform large inputs we need a reduction like this.
We can frame most of the workflows we care about as combinations of folds and unfolds. For example, consider this sequence:
- Generate relevant subquestions (unfold)
- Find a dataset for each subquestion (fold)
- Look up relevant data in each dataset (fold + unfold)
- Answer each subquestion given sources (fold + fold)
- Summarize subquestion answers into overall answer (fold)