deanlandolt/interval_ranges.md

## interval_ranges.md

      
    Raw
  

              interval_ranges.md
            
          
    INTERVALS

Intervals are the basic unit of composition for ltgt-style queries over a keyspace. Below is a list of some typical intervals in standard interval notion, translated to ltgt queries...
// closed/open: [0,100) ->
{ gte: 0, lt: 100

// open/closed: (0,100] ->
{ gt: 0, lte: 100 }

// open, e.g. posiive reals: (0,Infinity) ->
{ gt: 0, lt: Infinity }

// closed, e.g. non-negative extended reals: [0,Infinity] ->
{ gte: 0, lte: Infinity }

// NB: `++\ff` implies concatenating high bytes outside valid UTF-8 range
// closed root interval, e.g. /^c.*/, or ['c','c'++\ff) ->
{ gte: 'c', lte: 'c'++\uffff }

// closed root interval on an arbitrary string, e.g. /^foo.*/, or ['foo', 'foo' ++ \uffff)
{ gte: 'foo', lte: 'foo'++\uffff }

// open root interval, e.g. /^foo.+/, or ['foo\0','foo'++\ff)
{ gte: 'foo\0', lt: 'foo'++\uffff }
RANGES

A range is a list of one or more intervals.
This is typically an undiscriminated union, i.e. the set-theoretic disjunction of some set of intervals:
// (0,100] | [1000,2000) ->
[{ gt: 0, lte: 100 }, { gte: 1000, lt: 2000 }]
Overlapping intervals within a range can be collapsed down to single intervals:
// (0,100] | [50,150) ->
{ gt: 0, lt: 150 }
Adjacent intervals may also be collapsed, so long as at least one the two adjecent boundary points is closed:
// (0,50] | (50,100) ->
{ gt: 0, lt: 100 }

// (0,50] | [50,100) ->
{ gt: 0, lt: 100 }

// (0,50) | [50,100) ->
{ gt: 0, lt: 100 }

// (0,50) | (50,100) ->
[{ gt: 0, lt: 50 }, { gt: 50, lt: 100 }]
DISCRIMINATED RANGES

NB: this idea is kind of interesting in the context of using exhaustive pattern matching to define queries, but not at all necessary.
Another way to think about ranges of intervals is as a disjoint union, or tagged sum. From this perspective, overlapping intervals shouldn't be collapsed:
// [0,50) + [50,100) ->
[{ gte: 0, lt: 50 }, { gte: 50, lt: 100 }]
This is distinct from the an undiscriminated union, where overlapping intervals may be unioned together:
// [0,50) | [50,100) ->
{ gte: 0, lt: 100 }
Discriminated ranges are equivalent to their undiscrimated counterparts when no intervals overlap. This is reflected within the data structure:
// discriminated range:
// [0,50) + (50,100) ->
[{ gte: 0, lt: 50 }, { gt: 50, lt: 100 }]

// undiscriminated range (completely identical):
// [0,50) | (50,100) ->
[{ gte: 0, lt: 50 }, { gt: 50, lt: 100 }]
Thus, discriminated ranges are information-preserving, allowing for properties like the ability to use exhaustive pattern matching to author queries.
They are also useful when thinking about string ranges as a limited form of regexp. For example, imagine a tuple path syntax like so:
path('/foo/bar/{:number}/{+:any}')
This might compiled down to the interval:
{
  gte: bw.encode([ 'foo', 'bar', bw.sort.number.LOWER, bw.sort.any.LOWER ]),
  lte: bw.encode([ 'foo', 'bar', bw.sort.number.UPPER, bw.sort.any.UPPER ])
}
This could also allow trailing component quantification:
path('/foo/bar/{:number}/{*:any}')
This just denotes any trailing component as optional:
{
  gte: bw.encode([ 'foo', 'bar', bw.sort.number.LOWER ]),
  lte: bw.encode([ 'foo', 'bar', bw.sort.number.UPPER, bw.sort.any.UPPER ])
}
The above quantifiers only sensible for lexicographically sorted arrays. More powerful would be length-prefixed lists (tuples), which would sort in shortlex order (first by length, then componentwise). This would allow alternative forms of quantification:
// NB: '@' prefix represents short-lex array in below examples in lieu of a real syntax
path('@/foo/baz/{:number}/{2:any}')
This just says the /foo/bar/<number> component must be followed by exactly 2 additional components. Another way of saying this would be:
path('@/foo/baz/{:number}/{:any}/{:any}')
This can only be compiled down into a single contiguous interval over short-lex ordered structures:
// NB: `bw.tuple` intended to represent arrays encoded with length as prefix
{
  gte: bw.tuple('foo', 'baz', bw.sort.number.LOWER, bw.sort.any.LOWER, bw.sort.any.LOWER),
  lte: bw.tuple('foo', 'baz', bw.sort.number.UPPER, bw.sort.any.UPPER, bw.sort.any.UPPER)
}
This ordering would also allow Individual components to be further refined:
path('@/foo/baz/{:number[0,12]}/{:number(0,100)}/:string')
Assuming the existence of some type-specific epsilon functions necessary for opening and closing inner intervals:
{
  gte: bw.tuple('foo', 'baz', 0, bw.sort.number.above(0), bw.sort.string.LOWER),
  lte: bw.tuple('foo', 'baz', 12, bw.sort.number.below(100), bw.sort.string.UPPER)
}
Exact and string-prefixed matches can also be reliably generated:
path('@/foo/baz/{:number[10]}/{:string[abc_[A-Z]])}/:string(bar.+)')
This might look something like:
{
  gte: bw.tuple('foo', 'baz', 10, 'abc_A', bw.sort.string.next('bar'))),
  lte: bw.tuple('foo', 'baz', 10, 'abc_Z', bw.sort.string.above(bw.sort.string.next('bar')))
}
The string epsilon fn next would just concatenate the list separator (a null byte).
String searches are analagous to some (small) subset of prefix-rooted regular expressions.
This would also require some codepoint-specific helpers to generate 'abc_A', 'abc_Z', e.g. something like sort.string.concat('abc_A', sort.string.below(sort.string.range('A', 'C'))). Ideally the underlying API would be designed in a way that supports a direct, obvious desugaring from the kind of range refinement syntax described above.

  
## partitioned_type_system.md

      
    Raw
  

              partitioned_type_system.md
            
          
    NB: this is all hypothetically interesting, but not likely to be of any practical importance
PARTITIONS

Some ranges represent seemingly continous intervals with regular "holes" punched in them:

e.g. the range of real numbers less the integers, the range of rationals less the integers
these holes might be represented by some proposition function over the space
alternatively, these holes might be taken to represent a more "refined" space, embedded in the larger space

e.g. the space of integers, the space of odd numbers, the space of numbers evenly divisible by 8
these are called quotient spaces: https://en.wikipedia.org/wiki/Quotient_space_(topology)


The space between the points picked out by this comb represent unique partitioning of the domain:

this has a number of incredibly useful properties

being able to uniquely characterize these propositions would be wildly useful


even w/o uniqueness, there are practical benefits to being able to declaratively representing partitionings

this allows ranges to be defined which can jump forward
many leveldb-like implementations offer efficient support for this kind of "seek" (including leveldb)


Applicable to strings of atomic elements:

e.g. code points as a partitioning over variable-byte UTF8
or, a separate partitioning into code units (2-byte UCS2/UTF16 code units)

The path syntax described above could be enhanced substantially:

refinements available in shortlex arrays could be extended to lexical arrays
a much richer regular expression syntax could be supported over string spaces
would still require implicitly rooted patterns, of course

TYPE SYSTEM

NB: the below would be challenging, to say the least, and likely insane to truly attempt to flesh out...


indexing by the componentwise inversion of elements represents a negative space


can be modeled with a signal to represents direction of iteration


efficient support in implementations with native support for reversing cursors


combined with partitionings, this has some incredibly powerful properties


moreover, plays nicely with lex (componentwise-ordered) vs. shortlex (length-ordered) distinction


these represent two fundamentally distinct kinds of types


sized vs. unsized, to borrow rust's terminology


representable as type ^ size


may include additional "sensory" tags (e.g. unsigned/signed, floating point approximation, etc.)

Uint8 (octet, uint8_t) as SIZED * BIT ^ 8, as BYTE

the SIZED tag bit distinguishes this from a bitset (BIT ^ 8)


Int8 (byte, int8_t) as SIGNED * BYTE, or SIGNED * SIZED * BIT ^ 8, as SBYTE

SIGNED tag bit distinguishes this from an unsigned BYTE


Uint16 as UNSIGNED * BIT ^ 8 ^ 2, or BYTE ^ 2
Int32 (or int) as SIGNED * BYTE ^ 4
Float as APPROX * SBYTE ^ 4, as FLOAT

APPROX tag bit represents peculiarities of IEEE floating point space


Double as FLOAT ^ 2, APPROX * SBYTE ^ 8
Uint16Array(length: 1024) (or uint16_t[1024]) as SBYTE ^ 2 ^ (INDEXED * 1024)

INDEXED is a tag bit representing an element of an "indexed" family


3d vector (e.g. Double[3]) as DOUBLE ^ (INDEXED * 3)
Int32Array (or int32_t[]) as BYTE ^ 4 ^ (INDEXED * COUNTABLE), or BYTE ^ 4 ^ UNSIZED

COUNTABLE is a tag bit representing countable infinity, or aleph-naught

UNSIZED := INDEXED * COUNTABLE, represents an indexed set with countably infinite cardinality
when sequences are encoded, integer representing total "size" prefixes the list of elements

naturally sorts length-first, then componentwise


UNSIZED represented by the absense of a size prefix

naturally sorts componentwise, then by length


String (or Char[], or Array) as CODEPOINT ^ COUNTABLE

Char above is meant to imply a type representing a unicode codepoint
CODEPOINT is a tag bit representing this explicitly


lots more ...


the algebra's not right, but with a little work could be made to model type compatibility
these properties, combined together, form an incredibly rich type system

can capture and efficiently type check just about anything, including any higher order "kind" (i.e. generics)

type definitions themselves can be serialized and bitwise-compared


type can be compard with a lower type (or ground value) to determine inclusion relationship

this would have to be wildly more efficient than any other kind of runtime type gaurd
in one form or another, runtime type gaurds are fundamentally required for validating external data!


two types at the same "stratification" level can be compared for subsumption relationship

forms a complete type lattice that can be queried efficiently for meet and join
has a notion logical notion of positive and negative, forms a complete Ring

unlike most type systems that are just Rigs (Rings w/o negation)
treating promises as multiplicative inverses yields a complete Field
not just intellectual masturbation, this would yield some very useful properties
distance metric could be added to get something analog to a Riesz space

(this is probably intellectual masturbation)


types can be serialized and persisted as data

useful for modeling all kinds of containment-based problems

a type in 1 dimension represents an interval
in 2 dimensions, a polygon

if dimensions are independent, represents a box when fully constrained
linear dependencies create sloped (non-orthagonal) lines for the constraint boundary lines

containment easy to verify with a constraint solver


non-linear dependencies create half-spaces with curved boundary lines

precise containment becomes intractible, too difficult to pin down efficiently


types may have one or more "holes" for unbound variables

this is just a higher order type
holes themselves can be contrained w/ types (represents bounded polymorphism)


all of this would require a better "number" encoding

e.g. https://github.com/ghislainfourny/decimalInfinite
allows numbers to be used as prefix codes, more efficient in both storage and parsing complexity
also happens to capture arbitrary precision numbers in a lexicographical encoding!

derivatives as zippers

one-hole contexts particularly useful to model db cursors
e.g. db.createReadStream(‘[a,b] u [d,e]’)

http://chris-taylor.github.io/blog/2013/02/13/the-algebra-of-algebraic-data-types-part-iii/
n-dimensional Tuple as T ^ n
1-hole context (first derivative): ∂T ^ n as n * T ^ (n - 1)
if T is a constant, ∂T = 0

the first derivative of any contant is void (that is, you can't create any instances of it)
a constant is any interval where gt|gte === lt|lte (except gt === lt, which is already void)


this is surprisingly intuitive
it extends naturally to sums, products, and even compositions (higher order kinds, via the chain rule)