mitchellh/RadixTrees.tla

## RadixTrees.tla
This module contains operations for working with radix trees. A radix tree
is a data structure for efficient storage and lookup of values that often
share prefixes, typically used with strings.

A common question when I show this to people is: how do I add to the tree?
delete? update? For these, grab the Range of the tree, use set logic to
add/remove any elements, and construct a new tree with RadixTree.

For educational purposes, I've heavily commented all the operations. I
recommend using the constant expression evaluator to try the building blocks
to learn how things work if you're confused. That's how I learned.

----------------------------- MODULE RadixTrees -----------------------------
LOCAL INSTANCE FiniteSets
LOCAL INSTANCE Sequences
LOCAL INSTANCE Integers

-----------------------------------------------------------------------------

\* Helpers that aren't Radix-tree specific.

\* shortestSeq returns the shortest sequence in a set.
LOCAL shortestSeq(set) ==
  CHOOSE seq \in set:
    \A other \in set:
      /\ Len(seq) <= Len(other)

\* Filter the sequence to only return the values that start with c.
LOCAL filterPrefix(set, c) == { seq \in set: seq[1] = c }

\* Strip the prefix from each element in set. This assumes that each
\* element in set already starts with prefix. Empty values will not
\* be returned.
LOCAL stripPrefix(set, prefix) ==
  { SubSeq(seq, Len(prefix)+1, Len(seq)): seq \in set \ {prefix} }

\* Returns the set of first characters of a set of char sequences.
LOCAL firstChars(set) == { seq[1]: seq \in set }

\* Find the longest shared prefix of a set of sequences. Sequences can
\* be different lengths, but must have comparable types.
\* i.e. longestPrefix({1,2},{1,2,3},{1,2,5}) == {1,2}
LOCAL longestPrefix(set) ==
  \* Every item must at least have a common first character otherwise
  \* the longest prefix is surely empty
  IF \A x \in set, y \in set:
    /\ Len(x) >= 1
    /\ Len(y) >= 1
    /\ x[1] = y[1]
  THEN
  LET
    seq == shortestSeq(set)
    end == CHOOSE end \in 1..Len(seq):
      /\ \A i \in 1..end:
        \A x \in set, y \in set: x[i] = y[i]
      /\ IF Len(seq) <= end
         THEN TRUE
         ELSE \E x \in set, y \in set: x[end+1] /= y[end+1]
  IN SubSeq(seq, 1, end)
  ELSE <<>>

-----------------------------------------------------------------------------

\* Radix tree helpers

RECURSIVE range(_, _)
LOCAL range(T, prefix) ==
  LET
    current == IF T.Value THEN {prefix \o T.Prefix} ELSE {}
      \* current value of node (if exists)

    children == UNION {
      range(T.Edges[edge], prefix \o T.Prefix):
      edge \in DOMAIN T.Edges
    }
      \* child values for each edge. this creates a set of sets
      \* so we call union to flatten it.
  IN current \cup children

-----------------------------------------------------------------------------

\* Returns the constructed radix tree for the set of keys Keys.
RECURSIVE RadixTree(_)
RadixTree(Keys) ==
  IF Keys = {} THEN {} \* base case, no keys empty tree
  ELSE LET
    prefix == longestPrefix(Keys)
      \* longest shared prefix
    keys == stripPrefix(Keys, prefix)
      \* keys for children, prefix stripped
    edgeLabels == firstChars(stripPrefix(Keys, prefix))
      \* labels for each edge (single characters)
  IN [
    Prefix |-> prefix,
    Value |-> prefix \in Keys, \* true if this represents an inserted value
    Edges |-> [L \in edgeLabels |-> RadixTree(filterPrefix(keys, L))]
  ]

\* Range returns all of the values that are in the radix tree T.
Range(T) == range(T, <<>>)

=============================================================================
\* Modification History
\* Last modified Tue Jun 29 09:07:10 PDT 2021 by mitchellh
\* Created Mon Jun 28 08:08:13 PDT 2021 by mitchellh

## RadixTreesValidation.tla
------------------------ MODULE RadixTreesValidation ------------------------
EXTENDS FiniteSets, Integers, RadixTrees

\* Set of characters to use for the alphabet of generated strings.
CONSTANT Alphabet

\* Length of input strings generated
CONSTANT MinLength, MaxLength
ASSUME
  /\ {MinLength, MaxLength} \subseteq Nat
  /\ MinLength <= MaxLength

\* Number of unique elements to construct the radix tree with. This
\* is a set of numbers so you can test with inputs of multiple sizes.
CONSTANT ElementCounts
ASSUME ElementCounts \subseteq Nat

\* Inputs is the set of input strings valid for the tree.
Inputs == UNION { [1..n -> Alphabet]: n \in MinLength..MaxLength }

\* InputSets is the full set of possible inputs we can send to the radix tree.
InputSets == { T \in SUBSET Inputs: Cardinality(T) \in ElementCounts }

-----------------------------------------------------------------------------

\* The range of a radix tree should be the set of its inputs.
RangeIsInput ==
  \A input \in InputSets:
    Range(RadixTree(input)) = input

\* The expression that should be checked for validity in the model.
Valid == RangeIsInput

=============================================================================
\* Modification History
\* Last modified Tue Jun 29 09:06:56 PDT 2021 by mitchellh
\* Created Tue Jun 29 08:02:38 PDT 2021 by mitchellh
	This module contains operations for working with radix trees. A radix tree
	is a data structure for efficient storage and lookup of values that often
	share prefixes, typically used with strings.

	A common question when I show this to people is: how do I add to the tree?
	delete? update? For these, grab the Range of the tree, use set logic to
	add/remove any elements, and construct a new tree with RadixTree.

	For educational purposes, I've heavily commented all the operations. I
	recommend using the constant expression evaluator to try the building blocks
	to learn how things work if you're confused. That's how I learned.

	----------------------------- MODULE RadixTrees -----------------------------
	LOCAL INSTANCE FiniteSets
	LOCAL INSTANCE Sequences
	LOCAL INSTANCE Integers

	-----------------------------------------------------------------------------

	\* Helpers that aren't Radix-tree specific.

	\* shortestSeq returns the shortest sequence in a set.
	LOCAL shortestSeq(set) ==
	CHOOSE seq \in set:
	\A other \in set:
	/\ Len(seq) <= Len(other)

	\* Filter the sequence to only return the values that start with c.
	LOCAL filterPrefix(set, c) == { seq \in set: seq[1] = c }

	\* Strip the prefix from each element in set. This assumes that each
	\* element in set already starts with prefix. Empty values will not
	\* be returned.
	LOCAL stripPrefix(set, prefix) ==
	{ SubSeq(seq, Len(prefix)+1, Len(seq)): seq \in set \ {prefix} }

	\* Returns the set of first characters of a set of char sequences.
	LOCAL firstChars(set) == { seq[1]: seq \in set }

	\* Find the longest shared prefix of a set of sequences. Sequences can
	\* be different lengths, but must have comparable types.
	\* i.e. longestPrefix({1,2},{1,2,3},{1,2,5}) == {1,2}
	LOCAL longestPrefix(set) ==
	\* Every item must at least have a common first character otherwise
	\* the longest prefix is surely empty
	IF \A x \in set, y \in set:
	/\ Len(x) >= 1
	/\ Len(y) >= 1
	/\ x[1] = y[1]
	THEN
	LET
	seq == shortestSeq(set)
	end == CHOOSE end \in 1..Len(seq):
	/\ \A i \in 1..end:
	\A x \in set, y \in set: x[i] = y[i]
	/\ IF Len(seq) <= end
	THEN TRUE
	ELSE \E x \in set, y \in set: x[end+1] /= y[end+1]
	IN SubSeq(seq, 1, end)
	ELSE <<>>

	-----------------------------------------------------------------------------

	\* Radix tree helpers

	RECURSIVE range(_, _)
	LOCAL range(T, prefix) ==
	LET
	current == IF T.Value THEN {prefix \o T.Prefix} ELSE {}
	\* current value of node (if exists)

	children == UNION {
	range(T.Edges[edge], prefix \o T.Prefix):
	edge \in DOMAIN T.Edges
	}
	\* child values for each edge. this creates a set of sets
	\* so we call union to flatten it.
	IN current \cup children

	-----------------------------------------------------------------------------

	\* Returns the constructed radix tree for the set of keys Keys.
	RECURSIVE RadixTree(_)
	RadixTree(Keys) ==
	IF Keys = {} THEN {} \* base case, no keys empty tree
	ELSE LET
	prefix == longestPrefix(Keys)
	\* longest shared prefix
	keys == stripPrefix(Keys, prefix)
	\* keys for children, prefix stripped
	edgeLabels == firstChars(stripPrefix(Keys, prefix))
	\* labels for each edge (single characters)
	IN [
	Prefix \|-> prefix,
	Value \|-> prefix \in Keys, \* true if this represents an inserted value
	Edges \|-> [L \in edgeLabels \|-> RadixTree(filterPrefix(keys, L))]
	]

	\* Range returns all of the values that are in the radix tree T.
	Range(T) == range(T, <<>>)

	=============================================================================
	\* Modification History
	\* Last modified Tue Jun 29 09:07:10 PDT 2021 by mitchellh
	\* Created Mon Jun 28 08:08:13 PDT 2021 by mitchellh
	------------------------ MODULE RadixTreesValidation ------------------------
	EXTENDS FiniteSets, Integers, RadixTrees

	\* Set of characters to use for the alphabet of generated strings.
	CONSTANT Alphabet

	\* Length of input strings generated
	CONSTANT MinLength, MaxLength
	ASSUME
	/\ {MinLength, MaxLength} \subseteq Nat
	/\ MinLength <= MaxLength

	\* Number of unique elements to construct the radix tree with. This
	\* is a set of numbers so you can test with inputs of multiple sizes.
	CONSTANT ElementCounts
	ASSUME ElementCounts \subseteq Nat

	\* Inputs is the set of input strings valid for the tree.
	Inputs == UNION { [1..n -> Alphabet]: n \in MinLength..MaxLength }

	\* InputSets is the full set of possible inputs we can send to the radix tree.
	InputSets == { T \in SUBSET Inputs: Cardinality(T) \in ElementCounts }

	-----------------------------------------------------------------------------

	\* The range of a radix tree should be the set of its inputs.
	RangeIsInput ==
	\A input \in InputSets:
	Range(RadixTree(input)) = input

	\* The expression that should be checked for validity in the model.
	Valid == RangeIsInput

	=============================================================================
	\* Modification History
	\* Last modified Tue Jun 29 09:06:56 PDT 2021 by mitchellh
	\* Created Tue Jun 29 08:02:38 PDT 2021 by mitchellh