grncdr/codd.coffee.md

## codd.coffee.md

      
    Raw
  

              codd.coffee.md
            
          
    Codd - building immutable SQL queries from relational primitives

Codd is a new SQL query building library that is the spiritual successor to
Gesundheit. In Gesundheit, a
"query" was a mutable object with methods that modified a (nominally) internal
AST for a SQL statement. While this enabled an API that was both familiar and
expressive, it had some unfortunate side-effects (pun intended). For example,
making a non-destructive modification to an existing query requires a deep copy
of it's AST.
Instead, Codd uses a generative approach, where each refinement generates a
new, immutable object. The immutability means these objects can safely hold
references to eachother and be passed around without unintended side-effects.
The other major difference is that Codd is written as a literate CoffeeScript
document: the document you are currently reading!
Contents

The first section contains some full example queries, which also function as
unit tests. Then, after a short prelude in section 2 to set the
stage, sections 3-11 implement the major node types, building up to full
statements in section 11. The remaining sections implement the
more complicated machinery required by the preceding sections with expansive
explanations.

Synopsis
Prelude


The rendering process
A couple of helpers


Simple Nodes
Relations
Tables and columns
Projections
Restrictions
Joins
Aliases
Comparable objects
Conditions


makeConditions


Statements


SQL Functions
Bound parameters
Other handy things


Lists


Digression - memoRoot
Implementation of immutable node lists
Specializations of lists
Node tagging


defaultDialect implementation

Synopsis

test = require('tap').test
test 'Synopsis', (t) ->
The central concept of Codd is (perhaps unsurprisingly) the
[relation](http://en.wikipedia.org/wiki/Relation_(database). Though it's common
to refer to a database table as a relation, not all relations are tables. In
Codd's lexicon, "table" is reserved explicitly for a named relation, while
both joins (combination of tables) and restriction (subset of rows from
a relation) are derived relations.
  people = table('people')

  t.test 'Not all relations are tables', (t) ->
    t.ok(people.isTable,                          "Table is a table")
    t.ok(people.isRelation,                       "Table is a relation")
    t.ok(not people.where(name: 'Jerry').isTable, "Restriction is not table")
    t.end()
Regardless of whether a relation is a table or not, it hax methods for various
relational operations, such as column accessors and joins.
  posts    = table('posts')
  personId = people.c('id')
  authorId = posts.c('author_id')
  peoplePosts = people.innerJoin(posts, authorId.eq(personId))
While JavaScript objects representing relations are all well and good, you more
likely want to operate on an actual database, and the lingua franca (for better
or worse) is SQL. SQL is statement based, so in order to interact with a
database that speaks SQL, we must construct a statement and
render it to a string.
  t.test "statements and rendering", (t) ->
    ppQuery = select(peoplePosts)
    sql = render(ppQuery)

    t.equal sql, """
      SELECT *
      FROM "people"
        INNER JOIN "posts" ON "posts"."author_id" = "people"."id"
      """
Beyond simply rendering, a statement is also a relation, and can be restricted, joined, and projected.
    pppQuery = ppQuery.where('posts.published': true)

    t.equal render(pppQuery), """
      SELECT *
      FROM "people"
        INNER JOIN "posts" ON "posts"."author_id" = "people"."id"
      WHERE "posts"."published" = $1
      """,
      'select(...).restrict(...)'
      
    comments      = table('comments')
    commentPostId = comments.c('post_id')
    postId        = posts.c('id')

    pppAndCommentsQuery = pppQuery
      .innerJoin(comments, commentPostId.eq(postId))

    t.equal render(pppAndCommentsQuery), """
      SELECT *
      FROM "people"
        INNER JOIN "posts" ON "posts"."author_id" = "people"."id"
        INNER JOIN "comments" ON "comments"."post_id" = "posts"."id"
      WHERE "posts"."published" = $1""",
      'select(...).join(...)'
Of course, we can also derive our joined and restricted relation before creating the select statement.
    publishedPeoplePostsAndComments = people
      .innerJoin(posts.where(published: true), authorId.eq(personId))
      .innerJoin(comments, commentPostId.eq(postId))

    t.equal(
      render(select(publishedPeoplePostsAndComments)),
      render(pppAndCommentsQuery)
    )

    t.end() # End statements and rendering
The important point is that the results are equivalent whether we join posts,
people and comments before or after creating the select statement.
  t.end() # End Synopsis
  
test 'Restricting relations', (t) ->
  myres = table('myrel').restrict(mycol: 2)
  t.ok(myres.conditions, "The restriction has a `conditions` property")
  t.type(myres.conditions, 'object')
  t.equal('"myrel"."mycol" = $1', render(myres.conditions))
  t.end()
Prelude

The rendering process

Every AST node is responsible for rendering itself, but can rely on a dialect
visitor object to provide access to pluggable behaviours like SQL identifier
quoting.
The rendering process is initiated by calling the render function, which
creates a visitor and uses it to render a root node:
render = (root, dialect=defaultDialect) ->
  dialect.createVisitor().render(root)
(The implementation of defaultDialect is
shown later in this file).
A couple of helpers

A great deal of functionality is implemented by augmenting objects using simple
decorators. The augmenting decorators themselves are created with an augment
decorator that wraps any function so that it always returns it's first argument.
augment = (fn) -> (first) ->
  fn.apply(@, arguments)
  return first
Another common pattern is wrapping an object with a facade object that has it's
__proto__ set to the wrapped object. We define the wrap decorator to handle
the mechanics of this pattern.
wrap = (fn) -> (first) ->
  result = fn.apply(@, arguments)
  result.__proto__ = first
  return result
Simple Nodes

The most basic node, text, completely ignore the dialect when rendering,
and simply returns it's value directly.
text = (input) ->
  if typeof input isnt 'string'
    throw new Error("Input is not a string: #{input}")
  render: -> input
An identifier represents a table or column name, and relies on the dialect to
properly quote itself:
identifier = (input) ->
  text input
  render: (dialect) -> dialect.quote(input)
Relations

A relation is defined by the derivations that can be performed on it. Methods
for joining other relations, restricting a relation to a subset of
rows, and projecting a subset of columns are all
added to a source object with corresponding augmentation decorators.
relation = (source) ->
  tag 'Relation', aliasable joinable restrictable projectable source
The decorators above are implemented later on in this document, but here is an
list of the operations they add (not all join methods are listed).
test 'relation operations', (t) ->
  rel = relation({})
  t.ok(rel.isRelation, "relations are tagged 'isRelation'")
  t.type(rel[meth], 'function', "Relations can '#{meth}'") for meth in [
    'as'
    'innerJoin'
    'restrict'
    'where'
    'project'
  ]
  t.end()
Tables and columns

A table is an identifer that has been decorated as a relation and
augmented with methods for accessing the table name and creating columns that
refer back to it.
table = (name) ->
  tbl = relation identifier name
  tbl.name = -> name
  tbl.c = tbl.column = (name) -> column this, name
  tag 'Table', tbl
Notice that there is no way to modify the internal state (the name variable),
after the table has been created. This pattern will repeat throughout Codd.
A column is a comparable, aliasable identifier.
column = (source, name) ->
  col = comparable aliasable identifier name
  col.name = -> name
  col.render = (dialect) ->
    relName = dialect.quote(source.name())
    colName = if name is '*' then '*' else dialect.quote(name)
    relName + '.' + colName

  return col
The render method of columns is customized to be aware of aliasing so that
columns sourced from aliased relations are rendered correctly.
Projections

A projection is a list of columns derived from a "source" such as a table or
join.
Here we see our first decorator definition, projectable, that adds a .project
method to it's target. Recall that decorators return
their
first argument.
projectable = augment (source) ->
  source.project = (names...) -> projection(@, names)
The projection constructor itself creates an (internal) column list by
calling source.column, later we'll see how this can be used for projecting
from joins.
FROM = text("\nFROM ")
projection = wrap (source, names, parent_cols=null) ->
  cols = list((source.column(name) for name in names), ', ', parent_cols)
  p = tag 'Projection', restrictable list([cols, FROM, source], '')
  p.columns = cols
  p.column = p.c = (name) ->
    colnames = []
    for col in @columns.map((c) -> c)
      colname = col.name()
      return col if colname is name
      colnames.push colname
    throw new Error("Column #{name} is not present in #{colnames}")
  return p
Restrictions

Restrictions, (better known as WHERE clauses) are an essential part of
working with relational data. Relations, projections, and restrictions
themselves are all "restrictable", which means that we can construct a new
restriction using them as a source.
restrictable = augment (source) ->
  source.restrict = source.where = (conditions...) ->
    restriction @, conditions
(we also alias .where for familiarity)
The restriction constructor coerces the conditions input into proper nodes
using a conditionFactory, then returns a new wrapper holding
the condition list.
restriction = wrap (source, conditions) ->
  conditions = conditions.map(conditionFactory(source))
  return {
    isTable: false
    params: -> @conditions.params() ? []
    conditions: list(conditions, ' AND ', source.conditions)
  }
Joins

A join is a list of relations that can (for the most part) be treated like a
single wide table. E.g. you can access/project columns from any/all of the
relations it contains.
First off, we define a joinable decorator that adds multiple join constructor
methods to a source object. (This pattern is probably becoming familiar ;)
joinable = augment (source) ->
  source[name] = method for name, method of JOINS
The join methods themselves are quite complicated, and the rest of this section
explains their implementation in depth. We start with a list of the joins we
want to create methods for.
JOINS = {}

[
  'INNER JOIN'
  'LEFT JOIN'
  'RIGHT JOIN'
  'NATURAL JOIN'
  'CROSS JOIN'
  'LEFT OUTER JOIN'
  'RIGHT OUTER JOIN'
  'FULL OUTER JOIN'
]
.forEach (type) ->
The method names are camelCased versions of the join text, so 'LEFT OUTER  JOIN'
becomes leftOuterJoin.
  methodName = type
    .toLowerCase()
    .replace(/\s[a-z]/g, (t) -> t[1].toUpperCase())
The type itself is wrapped in a text node, we also need an ON node for when
join conditions are specified.
  TYPE = text(type)
  ON = text('ON')
  NO_CLAUSE = text('')
Each method starts by turning the given join conditions into a (possibly empty)
array.
  JOINS[methodName] = (right, conditions...) ->
    joinClause = if conditions.length > 1
      # conditions.map(conditionFactory(@))
      list [ON, list(conditions, ' AND ')]
    else if conditions.length
      list [ON, conditions[0]]
    else
      NO_CLAUSE
Joins contain a list of source relations (and their join conditions). Because
any relation (including other joins) can be either the left or right hand side
for this join, we must handle multiple disparate casea when merging the relation
lists.
Having a join on the right requires the most work, we must take a mutable copy
of it's relation list and remove the first element (a table name).
    left = this

    relationList = if right.relations
      rr = right.relations.slice()
      rrTable = rr.shift()
Once we have our rr and rrTable nodes, we must combine them with the left
hand relation, which may also have a list of source relations.
      if left.relations
        left.relations.add(list([TYPE, rrTable, joinClause]), rr...)
      else
        rr.unshift(left, list([TYPE, rrTable, joinClause]))
        list(rr)
If the right hand side is a table and not a join, combining it with the left is
simply concatenation of nodes.
    else if left.relations
      left.relations.add(list([TYPE, right, joinClause]))
    else
      list([left, list([TYPE, right, joinClause])], '\n  ')

    join = tag 'Join', relation relationList
Recall that the projectable decorator (applied by the relation decorator) requires a .column method on it's target.
Because a join can contain one or more relations, .column must special-case
star projections to ensure they apply to all relations in the join.
    join.column = join.c = (colname) ->
      return text('*') if colname is '*'
Furthermore, the column method must support specifying the
table as part of the column name, this can either be given as a two element
array, or a dot-separated string.
      if Array.isArray(colname)
        [relname, colname] = colname
      else
        [relname, colname] = colname.split('.')

      if colname?
        @table(relname).column(colname)
      else
        right.column(relname)
The .table method used in .column ensures that the table being
accessed is actually included in the join.
    join.table = (name) ->
      if left.isJoin # left hand side is a join
        try
          return left.table(name)
        catch e
          null
      else
        if left.name() is name then return left

      if right.name() is name then return right
      else throw new Error("Table #{name} not joined")
We must propagate projections and restrictions
from our source relations to the new join object before returning it.
    for prop in ['columns', 'conditions']
      if left[prop] and right[prop]
        join[prop] = left[prop].add(right[prop])
      else if left[prop]
        join[prop] = left[prop]
      else if right[prop]
        join[prop] = right[prop]
Finally, if we are joining a select statement, we return a new select statement
referencing the join.
    if left.isSelect or right.isSelect
      select join
    else
      join
Aliases

SQL supports aliasing tables and columns. To that end, the alias constructor
takes an arbitrary object and renders it with an alias name. Client code that
needs to know if an object is an alias must check for the presence of a name
method. We also include a simple decorator that adds an as method to it's
argument for easier alias creation.
alias = wrap (thing, name) ->
  as: null
  isAlias: true
  name: -> name
  render: (dialect) -> dialect.render(thing) + ' AS ' + dialect.quote(name)

aliasable = augment (thing) ->
  thing.as = (name) -> alias(thing, name)
Comparable objects

Columns are comparable; they have methods that generate binary comparison
nodes for use in restrictions and join conditions. Our decorator augments it's target with these comparison methods.
comparable = augment (obj) ->
  obj[k] = v for k, v of COMPARISONS
The comparison objects themselves are fairly simple, simply delegating to a
.compare method that constructs a binary node and tags it as a condition.
COMPARISONS =
  eq:  (other) -> @compare('=',  other)
  ne:  (other) -> @compare('<>', other)
  gt:  (other) -> @compare('>',  other)
  lt:  (other) -> @compare('<',  other)
  lte: (other) -> @compare('<=', other)
  gte: (other) -> @compare('>=', other)
  compare: (op, other) ->
    tag 'Condition', (binary(@, op, maybeParam(other)))

maybeParam = (node) -> if node.render then node else parameter(node)

test 'Comparison rendering', (t) ->
  one = comparable integer(1)
  comp = one.eq(one)
  t.equal(render(comp), '1 = 1', 'Simple comparison')
  t.end()
This tagging is indirectly by the .restrict method via the
condition factory in the next section.
Conditions

Conditions are lists of one or more comparisons that define what rows will be
allowed in a restriction. They can be combined with a logical AND or OR, and
may also be nested to express very complex conditions.
Conditions do not have standalone constructors, instead they are constructed by
a condition factory. The condition factory's sole purpose is to hold onto a
reference to a source for columns to be used when constructing comparisons.
conditionFactory = (source) -> (cond) ->
  if cond.isCondition
    cond
  else
    conds = makeConditions(cond, source)
    if conds.length is 1
      conds[0]
    else
      tag 'Condition',  list(conds, ' AND ')
makeConditions

The only complexity in conditionFactory is in avoiding redundant work. The
heavy lifting is all done by makeConditions which can turn a rather wide
variety of input into an array of comparison and list nodes.
makeConditions = (predicate, source) ->
  conditions = []
Our first check is if the predicate is already a condition, in which case we
return it immediately as a single item array.
  if predicate.isCondition
    return [predicate]
The next check is if our predicate is an array. We receive arrays of predicates
from recursive calls to makeConditions when an object argument contains one of
the special keys "and" or "or".
  if Array.isArray(predicate)
    for item in predicate
      if item.isCondition
        conditions.push(item)
      else
        conditions = conditions.concat(makeConditions(item, source))
    return conditions
Finally, if we have gotten this far we can assume that our argument was a condition literal: a literal object that uses column names as it's keys and
various constraints for it's values.
  for colname, constraint of predicate
    if colname is 'and'
      conditions.push AND(constraint, source)
    else if colname is 'or'
      conditions.push OR(constraint, source)
Unless, of course, the key is "and" or "or", in which case me make a recursive
call and construct a parenthesized list with the result. With that out
of the way, we can really assume that keys are column names...
    else
      col = source.column colname
      if constraint is null
        conditions.push col.compare('IS', text('NULL'))
      else if constraint.render? or constraint.constructor isnt Object
        conditions.push col.eq(constraint)
      else
        for op, val of constraint
          conditions.push col[op](val)

  # And finally return the array of conditions
  conditions
That's a fairly complicated chunk of code, so let's break it down a little:

Get a column object from our source.
If our constraint is null, the condition IS NULL
If the constraint has a .render property, or it's not an object at all
(i.e. - it's a number or string) create an equality comparison. Note that the
call to .eq will create a bound parameter for the constraint if necessary.
Finally, given an object without a .render method, we treat each of it's
keys as comparison methods. This is used for comparisons with a literal
shorthand (e.g .where(my_column: {gt: 2})).

AND = (conds, source) ->
  tag 'Condition',  parenthesized list(makeConditions(conds, source), ' AND ')

OR = (conds, source) ->
  tag 'Condition',  parenthesized list(makeConditions(conds, source), ' OR ')
Statements

The root of an AST is a normally a statement, which can be one of SELECT,
UPDATE, INSERT, or DELETE. Each of these statement types has a
constructor function of the same name (delete is a keyword in javascript so
our constructor is suffixed with an underscore).
# TODO groupable, orderable, limitable, offsetable
select = wrap (source) ->
  tag 'Select',
    render: memoRoot (dialect) ->
      'SELECT ' + [
        (if source.isProjection then source else source.project('*'))
        prefixed(@conditions, "\nWHERE ")
      ].map(dialect.render).filter(Boolean).join('')
    relations: source.isJoin and source

test "select statements", (t) ->
  posts = table('posts')
  t.ok(select(posts).isSelect)
  t.ok(
    select(posts.where(published: true)).conditions
    "Selects expose .conditions from their source"
  )

  t.ok(
    select(posts).column('published').eq(true)
    "Selects act like relations"
  )
  publishedPosts = select(posts).where(published: true)

  t.equal(
    render(select(posts).where(published: true))
    render(select(posts.where(published: true)))
    "Selects can be restricted rather than their source"
  )

  t.end()
SQL Functions

sqlFunction = (name, args) ->
  args = list(args, ', ')
  return {
    render: (dialect) -> "#{name}(#{dialect.render(args)})"
    params: list.params
  }
Bound parameters

parameter = (it) ->
  render: (dialect) ->
    dialect.placeholder()
  params: -> [it]
Other handy things

prefixed = wrap (thing, prefix) ->
  render: (dialect) ->
    if (out = thing and dialect.render(thing)) then prefix + out else ""

integer = (input) ->
  parsed = parseInt input
  if isNaN parsed
    throw new Error("Invalid integer: #{input}")
  render: -> parsed

limit  = (val) -> prefixed integer(val), 'LIMIT '
offset = (val) -> prefixed integer(val), 'OFFSET '

binary = (left, op, right) -> fixed list([left, text(op), right])

test 'binary nodes', (t) ->
  [left, right] = ['left', 'right'].map(text)
  t.equal(render(binary(left, 'op', right)), 'left op right')
  t.end()
Lists

A list is the most complicated kind of node, as they are an immutable
aggregate in a language that has no such concept natively. Lists are also where
the biggest performance optimization in Codd is implemented: because our lists
of nodes are immutable, we can safely memoize the result of rendering any list.
Digression - memoRoot

However, memoizing the rendered form of every list (and sublist) would lead
to a great number of redundant substrings taking up our precious memory. To show
how Codd avoids this, we need to understand a further detail of the rendering
process:

When a dialect visitor calls the .render method of a node, it pass two
arguments: itself, and a context object. This context object has a property,
root, that indicates whether this node is the root.

Using this, can implement a "memoRoot" decorator that memoizes the result of a
render call only when the node is the root:
memoRoot = (fn) ->
  result = undefined
  return (dialect, ctx) ->
    if ctx?.root
      result ?= fn.call(this, dialect, ctx)
    else
      fn.call(this, dialect, ctx)
Implementation of immutable node lists

Returning to the implementation of lists, we see that the constructor takes 3
arguments: an array of child nodes, a string with which the rendered child nodes
will be joined, and an optional parent list, which will be rendered in front of
it's immediate children.
list = (nodes=[], glue=' ', parent=null) ->
  if not nodes.length
    console.trace("Empty list created")
    return text('')
First we need to take a shallow copy of nodes so that our caller can't mutate
the array underneath us.
  nodes = nodes.slice()
We also create a callback function that will be used for collecting query
parameters from child nodes.
  childParams = ->
    nodes.map((n) -> n.params?() ? []).reduce((a, b) -> a.concat(b))
Finally, we construct and return the list object itself. Note how the add
method returns a new list.
  return {
    render: (dialect, path) ->
      prefix = (parent and dialect.render(parent) + glue) or ''
      prefix + nodes.map(dialect.render).filter(Boolean).join(glue)

    params:          -> (parent?.params() ? []).concat(childParams())
    add: (nodes...)  -> list(nodes, glue, this)
    slice: (a, b, c) -> nodes.slice(a, b, c)
    map: (fn)        -> nodes.map(fn, this)
  }
Specializations of lists

Tuples and fixed lists are simple wrappers around lists.
parenthesized = augment (list) ->
  original = list.render
  list.render = -> '(' + original.apply(@, arguments) + ')'

fixed = augment (list) -> list.add = null

tuple = (nodes...) -> parenthesized list(nodes, ', ')
Node tagging

tag = (names...) ->
  node = names.pop()
  for name in names
    node['is' + name] = true
  node
defaultDialect implementation

The defaultDialect visitor doesn't do anything too exciting, just initializes
some global state to track the current location in the tree, and passes itself
to the given nodes render method.
defaultDialect =
  createVisitor: ->
    nodepath = []
    placeholderCount = 1
    return visitor =
      quote: (s) -> '"' + s + '"'
      placeholder: -> "$#{placeholderCount++}"
      render: (node) ->
        nodepath.push(node)
        debugger unless node.render
        out = node.render(visitor, root: !nodepath.length)
        nodepath.pop(node)
        return out

  
## toc.coffee.md

      
    Raw
  

              toc.coffee.md
            
          
    This script generates a MarkDown table of contents from a source file. There are probably (many) better tools for this sort of thing, but I just needed a small step up from my vim macro.
fs = require 'fs'
source = fs.readFileSync(process.argv[2]).toString()
pad = (s, width, padding=' ') ->
  (padding for _ in [s.length...width]).join('') + s

section = 1
console.log source
  .split('\n')
  .filter((ln) -> ln[0] is '#')
  .slice(2)  # Ignore title and contents heading
  .map((ln) ->
    [_, markers, title] = ln.match(/^(#+) (.+)/)
    level = markers.length
    li = if level is 1
      # God help me if I ever have 100 sections...
      pad = (section < 10 and '0') or ''
      pad + (section++) + "."
    else
      markers.replace(/#/g, ' ') + '*'
    "#{li} [#{title}](##{title.replace(/\W/g, '-').toLowerCase()})"
  )
  .join('\n')