Intervals are the basic unit of composition for ltgt-style queries over a keyspace. Below is a list of some typical intervals in standard interval notion, translated to ltgt queries...
// closed/open: [0,100) ->
{ gte: 0, lt: 100
// open/closed: (0,100] ->
{ gt: 0, lte: 100 }
// open, e.g. posiive reals: (0,Infinity) ->
{ gt: 0, lt: Infinity }
// closed, e.g. non-negative extended reals: [0,Infinity] ->
{ gte: 0, lte: Infinity }
// NB: `++\ff` implies concatenating high bytes outside valid UTF-8 range
// closed root interval, e.g. /^c.*/, or ['c','c'++\ff) ->
{ gte: 'c', lte: 'c'++\uffff }
// closed root interval on an arbitrary string, e.g. /^foo.*/, or ['foo', 'foo' ++ \uffff)
{ gte: 'foo', lte: 'foo'++\uffff }
// open root interval, e.g. /^foo.+/, or ['foo\0','foo'++\ff)
{ gte: 'foo\0', lt: 'foo'++\uffff }
A range is a list of one or more intervals.
This is typically an undiscriminated union, i.e. the set-theoretic disjunction of some set of intervals:
// (0,100] | [1000,2000) ->
[{ gt: 0, lte: 100 }, { gte: 1000, lt: 2000 }]
Overlapping intervals within a range can be collapsed down to single intervals:
// (0,100] | [50,150) ->
{ gt: 0, lt: 150 }
Adjacent intervals may also be collapsed, so long as at least one the two adjecent boundary points is closed:
// (0,50] | (50,100) ->
{ gt: 0, lt: 100 }
// (0,50] | [50,100) ->
{ gt: 0, lt: 100 }
// (0,50) | [50,100) ->
{ gt: 0, lt: 100 }
// (0,50) | (50,100) ->
[{ gt: 0, lt: 50 }, { gt: 50, lt: 100 }]
NB: this idea is kind of interesting in the context of using exhaustive pattern matching to define queries, but not at all necessary.
Another way to think about ranges of intervals is as a disjoint union, or tagged sum. From this perspective, overlapping intervals shouldn't be collapsed:
// [0,50) + [50,100) ->
[{ gte: 0, lt: 50 }, { gte: 50, lt: 100 }]
This is distinct from the an undiscriminated union, where overlapping intervals may be unioned together:
// [0,50) | [50,100) ->
{ gte: 0, lt: 100 }
Discriminated ranges are equivalent to their undiscrimated counterparts when no intervals overlap. This is reflected within the data structure:
// discriminated range:
// [0,50) + (50,100) ->
[{ gte: 0, lt: 50 }, { gt: 50, lt: 100 }]
// undiscriminated range (completely identical):
// [0,50) | (50,100) ->
[{ gte: 0, lt: 50 }, { gt: 50, lt: 100 }]
Thus, discriminated ranges are information-preserving, allowing for properties like the ability to use exhaustive pattern matching to author queries.
They are also useful when thinking about string ranges as a limited form of regexp. For example, imagine a tuple path syntax like so:
path('/foo/bar/{:number}/{+:any}')
This might compiled down to the interval:
{
gte: bw.encode([ 'foo', 'bar', bw.sort.number.LOWER, bw.sort.any.LOWER ]),
lte: bw.encode([ 'foo', 'bar', bw.sort.number.UPPER, bw.sort.any.UPPER ])
}
This could also allow trailing component quantification:
path('/foo/bar/{:number}/{*:any}')
This just denotes any trailing component as optional:
{
gte: bw.encode([ 'foo', 'bar', bw.sort.number.LOWER ]),
lte: bw.encode([ 'foo', 'bar', bw.sort.number.UPPER, bw.sort.any.UPPER ])
}
The above quantifiers only sensible for lexicographically sorted arrays. More powerful would be length-prefixed lists (tuples), which would sort in shortlex order (first by length, then componentwise). This would allow alternative forms of quantification:
// NB: '@' prefix represents short-lex array in below examples in lieu of a real syntax
path('@/foo/baz/{:number}/{2:any}')
This just says the /foo/bar/<number>
component must be followed by exactly 2 additional components. Another way of saying this would be:
path('@/foo/baz/{:number}/{:any}/{:any}')
This can only be compiled down into a single contiguous interval over short-lex ordered structures:
// NB: `bw.tuple` intended to represent arrays encoded with length as prefix
{
gte: bw.tuple('foo', 'baz', bw.sort.number.LOWER, bw.sort.any.LOWER, bw.sort.any.LOWER),
lte: bw.tuple('foo', 'baz', bw.sort.number.UPPER, bw.sort.any.UPPER, bw.sort.any.UPPER)
}
This ordering would also allow Individual components to be further refined:
path('@/foo/baz/{:number[0,12]}/{:number(0,100)}/:string')
Assuming the existence of some type-specific epsilon functions necessary for opening and closing inner intervals:
{
gte: bw.tuple('foo', 'baz', 0, bw.sort.number.above(0), bw.sort.string.LOWER),
lte: bw.tuple('foo', 'baz', 12, bw.sort.number.below(100), bw.sort.string.UPPER)
}
Exact and string-prefixed matches can also be reliably generated:
path('@/foo/baz/{:number[10]}/{:string[abc_[A-Z]])}/:string(bar.+)')
This might look something like:
{
gte: bw.tuple('foo', 'baz', 10, 'abc_A', bw.sort.string.next('bar'))),
lte: bw.tuple('foo', 'baz', 10, 'abc_Z', bw.sort.string.above(bw.sort.string.next('bar')))
}
The string epsilon fn next
would just concatenate the list separator (a null byte).
String searches are analagous to some (small) subset of prefix-rooted regular expressions.
This would also require some codepoint-specific helpers to generate 'abc_A', 'abc_Z', e.g. something like sort.string.concat('abc_A', sort.string.below(sort.string.range('A', 'C')))
. Ideally the underlying API would be designed in a way that supports a direct, obvious desugaring from the kind of range refinement syntax described above.