Skip to content

Instantly share code, notes, and snippets.

@deanlandolt
Last active August 28, 2015 03:44
Show Gist options
  • Save deanlandolt/c1f8ba0d785d5b7ca80f to your computer and use it in GitHub Desktop.
Save deanlandolt/c1f8ba0d785d5b7ca80f to your computer and use it in GitHub Desktop.

INTERVALS

Intervals are the basic unit of composition for ltgt-style queries over a keyspace. Below is a list of some typical intervals in standard interval notion, translated to ltgt queries...

// closed/open: [0,100) ->
{ gte: 0, lt: 100

// open/closed: (0,100] ->
{ gt: 0, lte: 100 }

// open, e.g. posiive reals: (0,Infinity) ->
{ gt: 0, lt: Infinity }

// closed, e.g. non-negative extended reals: [0,Infinity] ->
{ gte: 0, lte: Infinity }

// NB: `++\ff` implies concatenating high bytes outside valid UTF-8 range
// closed root interval, e.g. /^c.*/, or ['c','c'++\ff) ->
{ gte: 'c', lte: 'c'++\uffff }

// closed root interval on an arbitrary string, e.g. /^foo.*/, or ['foo', 'foo' ++ \uffff)
{ gte: 'foo', lte: 'foo'++\uffff }

// open root interval, e.g. /^foo.+/, or ['foo\0','foo'++\ff)
{ gte: 'foo\0', lt: 'foo'++\uffff }

RANGES

A range is a list of one or more intervals.

This is typically an undiscriminated union, i.e. the set-theoretic disjunction of some set of intervals:

// (0,100] | [1000,2000) ->
[{ gt: 0, lte: 100 }, { gte: 1000, lt: 2000 }]

Overlapping intervals within a range can be collapsed down to single intervals:

// (0,100] | [50,150) ->
{ gt: 0, lt: 150 }

Adjacent intervals may also be collapsed, so long as at least one the two adjecent boundary points is closed:

// (0,50] | (50,100) ->
{ gt: 0, lt: 100 }

// (0,50] | [50,100) ->
{ gt: 0, lt: 100 }

// (0,50) | [50,100) ->
{ gt: 0, lt: 100 }

// (0,50) | (50,100) ->
[{ gt: 0, lt: 50 }, { gt: 50, lt: 100 }]

DISCRIMINATED RANGES

NB: this idea is kind of interesting in the context of using exhaustive pattern matching to define queries, but not at all necessary.

Another way to think about ranges of intervals is as a disjoint union, or tagged sum. From this perspective, overlapping intervals shouldn't be collapsed:

// [0,50) + [50,100) ->
[{ gte: 0, lt: 50 }, { gte: 50, lt: 100 }]

This is distinct from the an undiscriminated union, where overlapping intervals may be unioned together:

// [0,50) | [50,100) ->
{ gte: 0, lt: 100 }

Discriminated ranges are equivalent to their undiscrimated counterparts when no intervals overlap. This is reflected within the data structure:

// discriminated range:
// [0,50) + (50,100) ->
[{ gte: 0, lt: 50 }, { gt: 50, lt: 100 }]

// undiscriminated range (completely identical):
// [0,50) | (50,100) ->
[{ gte: 0, lt: 50 }, { gt: 50, lt: 100 }]

Thus, discriminated ranges are information-preserving, allowing for properties like the ability to use exhaustive pattern matching to author queries.

They are also useful when thinking about string ranges as a limited form of regexp. For example, imagine a tuple path syntax like so:

path('/foo/bar/{:number}/{+:any}')

This might compiled down to the interval:

{
  gte: bw.encode([ 'foo', 'bar', bw.sort.number.LOWER, bw.sort.any.LOWER ]),
  lte: bw.encode([ 'foo', 'bar', bw.sort.number.UPPER, bw.sort.any.UPPER ])
}

This could also allow trailing component quantification:

path('/foo/bar/{:number}/{*:any}')

This just denotes any trailing component as optional:

{
  gte: bw.encode([ 'foo', 'bar', bw.sort.number.LOWER ]),
  lte: bw.encode([ 'foo', 'bar', bw.sort.number.UPPER, bw.sort.any.UPPER ])
}

The above quantifiers only sensible for lexicographically sorted arrays. More powerful would be length-prefixed lists (tuples), which would sort in shortlex order (first by length, then componentwise). This would allow alternative forms of quantification:

// NB: '@' prefix represents short-lex array in below examples in lieu of a real syntax
path('@/foo/baz/{:number}/{2:any}')

This just says the /foo/bar/<number> component must be followed by exactly 2 additional components. Another way of saying this would be:

path('@/foo/baz/{:number}/{:any}/{:any}')

This can only be compiled down into a single contiguous interval over short-lex ordered structures:

// NB: `bw.tuple` intended to represent arrays encoded with length as prefix
{
  gte: bw.tuple('foo', 'baz', bw.sort.number.LOWER, bw.sort.any.LOWER, bw.sort.any.LOWER),
  lte: bw.tuple('foo', 'baz', bw.sort.number.UPPER, bw.sort.any.UPPER, bw.sort.any.UPPER)
}

This ordering would also allow Individual components to be further refined:

path('@/foo/baz/{:number[0,12]}/{:number(0,100)}/:string')

Assuming the existence of some type-specific epsilon functions necessary for opening and closing inner intervals:

{
  gte: bw.tuple('foo', 'baz', 0, bw.sort.number.above(0), bw.sort.string.LOWER),
  lte: bw.tuple('foo', 'baz', 12, bw.sort.number.below(100), bw.sort.string.UPPER)
}

Exact and string-prefixed matches can also be reliably generated:

path('@/foo/baz/{:number[10]}/{:string[abc_[A-Z]])}/:string(bar.+)')

This might look something like:

{
  gte: bw.tuple('foo', 'baz', 10, 'abc_A', bw.sort.string.next('bar'))),
  lte: bw.tuple('foo', 'baz', 10, 'abc_Z', bw.sort.string.above(bw.sort.string.next('bar')))
}

The string epsilon fn next would just concatenate the list separator (a null byte). String searches are analagous to some (small) subset of prefix-rooted regular expressions.

This would also require some codepoint-specific helpers to generate 'abc_A', 'abc_Z', e.g. something like sort.string.concat('abc_A', sort.string.below(sort.string.range('A', 'C'))). Ideally the underlying API would be designed in a way that supports a direct, obvious desugaring from the kind of range refinement syntax described above.

NB: this is all hypothetically interesting, but not likely to be of any practical importance

PARTITIONS

Some ranges represent seemingly continous intervals with regular "holes" punched in them:

  • e.g. the range of real numbers less the integers, the range of rationals less the integers
  • these holes might be represented by some proposition function over the space
  • alternatively, these holes might be taken to represent a more "refined" space, embedded in the larger space

The space between the points picked out by this comb represent unique partitioning of the domain:

  • this has a number of incredibly useful properties
    • being able to uniquely characterize these propositions would be wildly useful
  • even w/o uniqueness, there are practical benefits to being able to declaratively representing partitionings
    • this allows ranges to be defined which can jump forward
    • many leveldb-like implementations offer efficient support for this kind of "seek" (including leveldb)

Applicable to strings of atomic elements:

  • e.g. code points as a partitioning over variable-byte UTF8
  • or, a separate partitioning into code units (2-byte UCS2/UTF16 code units)

The path syntax described above could be enhanced substantially:

  • refinements available in shortlex arrays could be extended to lexical arrays
  • a much richer regular expression syntax could be supported over string spaces
  • would still require implicitly rooted patterns, of course

TYPE SYSTEM

NB: the below would be challenging, to say the least, and likely insane to truly attempt to flesh out...

  • indexing by the componentwise inversion of elements represents a negative space

  • can be modeled with a signal to represents direction of iteration

  • efficient support in implementations with native support for reversing cursors

  • combined with partitionings, this has some incredibly powerful properties

  • moreover, plays nicely with lex (componentwise-ordered) vs. shortlex (length-ordered) distinction

  • these represent two fundamentally distinct kinds of types

  • sized vs. unsized, to borrow rust's terminology

  • representable as type ^ size

  • may include additional "sensory" tags (e.g. unsigned/signed, floating point approximation, etc.)

    • Uint8 (octet, uint8_t) as SIZED * BIT ^ 8, as BYTE
      • the SIZED tag bit distinguishes this from a bitset (BIT ^ 8)
    • Int8 (byte, int8_t) as SIGNED * BYTE, or SIGNED * SIZED * BIT ^ 8, as SBYTE
      • SIGNED tag bit distinguishes this from an unsigned BYTE
    • Uint16 as UNSIGNED * BIT ^ 8 ^ 2, or BYTE ^ 2
    • Int32 (or int) as SIGNED * BYTE ^ 4
    • Float as APPROX * SBYTE ^ 4, as FLOAT
      • APPROX tag bit represents peculiarities of IEEE floating point space
    • Double as FLOAT ^ 2, APPROX * SBYTE ^ 8
    • Uint16Array(length: 1024) (or uint16_t[1024]) as SBYTE ^ 2 ^ (INDEXED * 1024)
      • INDEXED is a tag bit representing an element of an "indexed" family
    • 3d vector (e.g. Double[3]) as DOUBLE ^ (INDEXED * 3)
    • Int32Array (or int32_t[]) as BYTE ^ 4 ^ (INDEXED * COUNTABLE), or BYTE ^ 4 ^ UNSIZED
      • COUNTABLE is a tag bit representing countable infinity, or aleph-naught
        • UNSIZED := INDEXED * COUNTABLE, represents an indexed set with countably infinite cardinality
        • when sequences are encoded, integer representing total "size" prefixes the list of elements
          • naturally sorts length-first, then componentwise
        • UNSIZED represented by the absense of a size prefix
          • naturally sorts componentwise, then by length
    • String (or Char[], or Array) as CODEPOINT ^ COUNTABLE
      • Char above is meant to imply a type representing a unicode codepoint
      • CODEPOINT is a tag bit representing this explicitly
  • lots more ...

the algebra's not right, but with a little work could be made to model type compatibility

these properties, combined together, form an incredibly rich type system

  • can capture and efficiently type check just about anything, including any higher order "kind" (i.e. generics)

type definitions themselves can be serialized and bitwise-compared

  • type can be compard with a lower type (or ground value) to determine inclusion relationship

    • this would have to be wildly more efficient than any other kind of runtime type gaurd
    • in one form or another, runtime type gaurds are fundamentally required for validating external data!
  • two types at the same "stratification" level can be compared for subsumption relationship

    • forms a complete type lattice that can be queried efficiently for meet and join
    • has a notion logical notion of positive and negative, forms a complete Ring
      • unlike most type systems that are just Rigs (Rings w/o negation)
      • treating promises as multiplicative inverses yields a complete Field
      • not just intellectual masturbation, this would yield some very useful properties
      • distance metric could be added to get something analog to a Riesz space
        • (this is probably intellectual masturbation)
  • types can be serialized and persisted as data

    • useful for modeling all kinds of containment-based problems
      • a type in 1 dimension represents an interval
      • in 2 dimensions, a polygon
        • if dimensions are independent, represents a box when fully constrained
        • linear dependencies create sloped (non-orthagonal) lines for the constraint boundary lines
          • containment easy to verify with a constraint solver
        • non-linear dependencies create half-spaces with curved boundary lines
          • precise containment becomes intractible, too difficult to pin down efficiently
  • types may have one or more "holes" for unbound variables

    • this is just a higher order type
    • holes themselves can be contrained w/ types (represents bounded polymorphism)

all of this would require a better "number" encoding

derivatives as zippers

  • one-hole contexts particularly useful to model db cursors
  • e.g. db.createReadStream(‘[a,b] u [d,e]’)
    • http://chris-taylor.github.io/blog/2013/02/13/the-algebra-of-algebraic-data-types-part-iii/
    • n-dimensional Tuple as T ^ n
    • 1-hole context (first derivative): ∂T ^ n as n * T ^ (n - 1)
    • if T is a constant, ∂T = 0
      • the first derivative of any contant is void (that is, you can't create any instances of it)
      • a constant is any interval where gt|gte === lt|lte (except gt === lt, which is already void)
    • this is surprisingly intuitive
    • it extends naturally to sums, products, and even compositions (higher order kinds, via the chain rule)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment