{{ message }}

Instantly share code, notes, and snippets.

Last active Jun 30, 2021
Radix tree and validation in TLA+
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters
 ------------------------ MODULE RadixTreesValidation ------------------------ EXTENDS FiniteSets, Integers, RadixTrees \* Set of characters to use for the alphabet of generated strings. CONSTANT Alphabet \* Length of input strings generated CONSTANT MinLength, MaxLength ASSUME /\ {MinLength, MaxLength} \subseteq Nat /\ MinLength <= MaxLength \* Number of unique elements to construct the radix tree with. This \* is a set of numbers so you can test with inputs of multiple sizes. CONSTANT ElementCounts ASSUME ElementCounts \subseteq Nat \* Inputs is the set of input strings valid for the tree. Inputs == UNION { [1..n -> Alphabet]: n \in MinLength..MaxLength } \* InputSets is the full set of possible inputs we can send to the radix tree. InputSets == { T \in SUBSET Inputs: Cardinality(T) \in ElementCounts } ----------------------------------------------------------------------------- \* The range of a radix tree should be the set of its inputs. RangeIsInput == \A input \in InputSets: Range(RadixTree(input)) = input \* The expression that should be checked for validity in the model. Valid == RangeIsInput ============================================================================= \* Modification History \* Last modified Tue Jun 29 09:06:56 PDT 2021 by mitchellh \* Created Tue Jun 29 08:02:38 PDT 2021 by mitchellh

Pretty solid modeling! Couple comments on my first quick skim:

You can write \A x, y \in set: instead of \A x \in set, y \in set.

On line #56, you have a statement of form IF x THEN TRUE ELSE P. The truth table of this is

x P o
T T T
T F T
F T T
F F F

So it's just x \/ P.

I'd probably write longestPrefix as something like

longestPrefix(set) ==
LET
seq == ShortestSeq(set)
preflens == {0} \cup {
i \in 1..Len(seq):
\A s1, s2 in set: SubSeq(s1, 1, i) = SubSeq(s2, 1, i) }
IN SubSeq(seq, 1, i)

(SubSeq(seq, 1, 0) is the empty sequence in all cases)

range and RadixTree are the core parts of the algorithm but I haven't studied them yet.

### mitchellh commented Jun 29, 2021

 Thanks @hwayne! I've made a couple of those changes (I left longestPrefix as is for now). I had x \/ P prior but I think due to some iteration/refactoring it came out into that weird IF expr. Thanks for catching that again, cleaned that up. Note that I originally tried to write this similar to your LinkedList ex in Ch 8 of your book where I created the set of all [including invalid] trees that are roughly shaped like radix trees, then use filtering to get the one I want. Even for small alphabets/lengths this created too large of sets, so I had to fall back to a more iterative/programmery approach seen here. Not sure if I did something wrong.

### hwayne commented Jun 29, 2021

 The iterative programmatic approach is perfectly fine; while it's nice to write data structures as formal definitions, it quickly leads to exponential blowup, so we have to compromise in the face of reality :)

### lemmy commented Jun 29, 2021

 @mitchellh TLC lets you override operators with code should the pure TLA+ definition become a bottleneck (a profiler helps find bottlenecks). This way, we can write specs that favor readability over efficiency while helping tools. Almost all of the standard modules and some of the community modules have code overrides.