Skip to content

Instantly share code, notes, and snippets.

@grncdr
Last active December 14, 2015 06:08
Show Gist options
  • Star 4 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save grncdr/5039898 to your computer and use it in GitHub Desktop.
Save grncdr/5039898 to your computer and use it in GitHub Desktop.

Codd - building immutable SQL queries from relational primitives

Codd is a new SQL query building library that is the spiritual successor to Gesundheit. In Gesundheit, a "query" was a mutable object with methods that modified a (nominally) internal AST for a SQL statement. While this enabled an API that was both familiar and expressive, it had some unfortunate side-effects (pun intended). For example, making a non-destructive modification to an existing query requires a deep copy of it's AST.

Instead, Codd uses a generative approach, where each refinement generates a new, immutable object. The immutability means these objects can safely hold references to eachother and be passed around without unintended side-effects.

The other major difference is that Codd is written as a literate CoffeeScript document: the document you are currently reading!

Contents

The first section contains some full example queries, which also function as unit tests. Then, after a short prelude in section 2 to set the stage, sections 3-11 implement the major node types, building up to full statements in section 11. The remaining sections implement the more complicated machinery required by the preceding sections with expansive explanations.

  1. Synopsis
  2. Prelude
  1. Simple Nodes
  2. Relations
  3. Tables and columns
  4. Projections
  5. Restrictions
  6. Joins
  7. Aliases
  8. Comparable objects
  9. Conditions
  1. Statements
  1. Lists
  1. defaultDialect implementation

Synopsis

test = require('tap').test
test 'Synopsis', (t) ->

The central concept of Codd is (perhaps unsurprisingly) the [relation](http://en.wikipedia.org/wiki/Relation_(database). Though it's common to refer to a database table as a relation, not all relations are tables. In Codd's lexicon, "table" is reserved explicitly for a named relation, while both joins (combination of tables) and restriction (subset of rows from a relation) are derived relations.

  people = table('people')

  t.test 'Not all relations are tables', (t) ->
    t.ok(people.isTable,                          "Table is a table")
    t.ok(people.isRelation,                       "Table is a relation")
    t.ok(not people.where(name: 'Jerry').isTable, "Restriction is not table")
    t.end()

Regardless of whether a relation is a table or not, it hax methods for various relational operations, such as column accessors and joins.

  posts    = table('posts')
  personId = people.c('id')
  authorId = posts.c('author_id')
  peoplePosts = people.innerJoin(posts, authorId.eq(personId))

While JavaScript objects representing relations are all well and good, you more likely want to operate on an actual database, and the lingua franca (for better or worse) is SQL. SQL is statement based, so in order to interact with a database that speaks SQL, we must construct a statement and render it to a string.

  t.test "statements and rendering", (t) ->
    ppQuery = select(peoplePosts)
    sql = render(ppQuery)

    t.equal sql, """
      SELECT *
      FROM "people"
        INNER JOIN "posts" ON "posts"."author_id" = "people"."id"
      """

Beyond simply rendering, a statement is also a relation, and can be restricted, joined, and projected.

    pppQuery = ppQuery.where('posts.published': true)

    t.equal render(pppQuery), """
      SELECT *
      FROM "people"
        INNER JOIN "posts" ON "posts"."author_id" = "people"."id"
      WHERE "posts"."published" = $1
      """,
      'select(...).restrict(...)'
      
    comments      = table('comments')
    commentPostId = comments.c('post_id')
    postId        = posts.c('id')

    pppAndCommentsQuery = pppQuery
      .innerJoin(comments, commentPostId.eq(postId))

    t.equal render(pppAndCommentsQuery), """
      SELECT *
      FROM "people"
        INNER JOIN "posts" ON "posts"."author_id" = "people"."id"
        INNER JOIN "comments" ON "comments"."post_id" = "posts"."id"
      WHERE "posts"."published" = $1""",
      'select(...).join(...)'

Of course, we can also derive our joined and restricted relation before creating the select statement.

    publishedPeoplePostsAndComments = people
      .innerJoin(posts.where(published: true), authorId.eq(personId))
      .innerJoin(comments, commentPostId.eq(postId))

    t.equal(
      render(select(publishedPeoplePostsAndComments)),
      render(pppAndCommentsQuery)
    )

    t.end() # End statements and rendering

The important point is that the results are equivalent whether we join posts, people and comments before or after creating the select statement.

  t.end() # End Synopsis
  
test 'Restricting relations', (t) ->
  myres = table('myrel').restrict(mycol: 2)
  t.ok(myres.conditions, "The restriction has a `conditions` property")
  t.type(myres.conditions, 'object')
  t.equal('"myrel"."mycol" = $1', render(myres.conditions))
  t.end()

Prelude

The rendering process

Every AST node is responsible for rendering itself, but can rely on a dialect visitor object to provide access to pluggable behaviours like SQL identifier quoting.

The rendering process is initiated by calling the render function, which creates a visitor and uses it to render a root node:

render = (root, dialect=defaultDialect) ->
  dialect.createVisitor().render(root)

(The implementation of defaultDialect is shown later in this file).

A couple of helpers

A great deal of functionality is implemented by augmenting objects using simple decorators. The augmenting decorators themselves are created with an augment decorator that wraps any function so that it always returns it's first argument.

augment = (fn) -> (first) ->
  fn.apply(@, arguments)
  return first

Another common pattern is wrapping an object with a facade object that has it's __proto__ set to the wrapped object. We define the wrap decorator to handle the mechanics of this pattern.

wrap = (fn) -> (first) ->
  result = fn.apply(@, arguments)
  result.__proto__ = first
  return result

Simple Nodes

The most basic node, text, completely ignore the dialect when rendering, and simply returns it's value directly.

text = (input) ->
  if typeof input isnt 'string'
    throw new Error("Input is not a string: #{input}")
  render: -> input

An identifier represents a table or column name, and relies on the dialect to properly quote itself:

identifier = (input) ->
  text input
  render: (dialect) -> dialect.quote(input)

Relations

A relation is defined by the derivations that can be performed on it. Methods for joining other relations, restricting a relation to a subset of rows, and projecting a subset of columns are all added to a source object with corresponding augmentation decorators.

relation = (source) ->
  tag 'Relation', aliasable joinable restrictable projectable source

The decorators above are implemented later on in this document, but here is an list of the operations they add (not all join methods are listed).

test 'relation operations', (t) ->
  rel = relation({})
  t.ok(rel.isRelation, "relations are tagged 'isRelation'")
  t.type(rel[meth], 'function', "Relations can '#{meth}'") for meth in [
    'as'
    'innerJoin'
    'restrict'
    'where'
    'project'
  ]
  t.end()

Tables and columns

A table is an identifer that has been decorated as a relation and augmented with methods for accessing the table name and creating columns that refer back to it.

table = (name) ->
  tbl = relation identifier name
  tbl.name = -> name
  tbl.c = tbl.column = (name) -> column this, name
  tag 'Table', tbl

Notice that there is no way to modify the internal state (the name variable), after the table has been created. This pattern will repeat throughout Codd.

A column is a comparable, aliasable identifier.

column = (source, name) ->
  col = comparable aliasable identifier name
  col.name = -> name
  col.render = (dialect) ->
    relName = dialect.quote(source.name())
    colName = if name is '*' then '*' else dialect.quote(name)
    relName + '.' + colName

  return col

The render method of columns is customized to be aware of aliasing so that columns sourced from aliased relations are rendered correctly.

Projections

A projection is a list of columns derived from a "source" such as a table or join.

Here we see our first decorator definition, projectable, that adds a .project method to it's target. Recall that decorators return their first argument.

projectable = augment (source) ->
  source.project = (names...) -> projection(@, names)

The projection constructor itself creates an (internal) column list by calling source.column, later we'll see how this can be used for projecting from joins.

FROM = text("\nFROM ")
projection = wrap (source, names, parent_cols=null) ->
  cols = list((source.column(name) for name in names), ', ', parent_cols)
  p = tag 'Projection', restrictable list([cols, FROM, source], '')
  p.columns = cols
  p.column = p.c = (name) ->
    colnames = []
    for col in @columns.map((c) -> c)
      colname = col.name()
      return col if colname is name
      colnames.push colname
    throw new Error("Column #{name} is not present in #{colnames}")
  return p

Restrictions

Restrictions, (better known as WHERE clauses) are an essential part of working with relational data. Relations, projections, and restrictions themselves are all "restrictable", which means that we can construct a new restriction using them as a source.

restrictable = augment (source) ->
  source.restrict = source.where = (conditions...) ->
    restriction @, conditions

(we also alias .where for familiarity)

The restriction constructor coerces the conditions input into proper nodes using a conditionFactory, then returns a new wrapper holding the condition list.

restriction = wrap (source, conditions) ->
  conditions = conditions.map(conditionFactory(source))
  return {
    isTable: false
    params: -> @conditions.params() ? []
    conditions: list(conditions, ' AND ', source.conditions)
  }

Joins

A join is a list of relations that can (for the most part) be treated like a single wide table. E.g. you can access/project columns from any/all of the relations it contains.

First off, we define a joinable decorator that adds multiple join constructor methods to a source object. (This pattern is probably becoming familiar ;)

joinable = augment (source) ->
  source[name] = method for name, method of JOINS

The join methods themselves are quite complicated, and the rest of this section explains their implementation in depth. We start with a list of the joins we want to create methods for.

JOINS = {}

[
  'INNER JOIN'
  'LEFT JOIN'
  'RIGHT JOIN'
  'NATURAL JOIN'
  'CROSS JOIN'
  'LEFT OUTER JOIN'
  'RIGHT OUTER JOIN'
  'FULL OUTER JOIN'
]
.forEach (type) ->

The method names are camelCased versions of the join text, so 'LEFT OUTER JOIN' becomes leftOuterJoin.

  methodName = type
    .toLowerCase()
    .replace(/\s[a-z]/g, (t) -> t[1].toUpperCase())

The type itself is wrapped in a text node, we also need an ON node for when join conditions are specified.

  TYPE = text(type)
  ON = text('ON')
  NO_CLAUSE = text('')

Each method starts by turning the given join conditions into a (possibly empty) array.

  JOINS[methodName] = (right, conditions...) ->
    joinClause = if conditions.length > 1
      # conditions.map(conditionFactory(@))
      list [ON, list(conditions, ' AND ')]
    else if conditions.length
      list [ON, conditions[0]]
    else
      NO_CLAUSE

Joins contain a list of source relations (and their join conditions). Because any relation (including other joins) can be either the left or right hand side for this join, we must handle multiple disparate casea when merging the relation lists.

Having a join on the right requires the most work, we must take a mutable copy of it's relation list and remove the first element (a table name).

    left = this

    relationList = if right.relations
      rr = right.relations.slice()
      rrTable = rr.shift()

Once we have our rr and rrTable nodes, we must combine them with the left hand relation, which may also have a list of source relations.

      if left.relations
        left.relations.add(list([TYPE, rrTable, joinClause]), rr...)
      else
        rr.unshift(left, list([TYPE, rrTable, joinClause]))
        list(rr)

If the right hand side is a table and not a join, combining it with the left is simply concatenation of nodes.

    else if left.relations
      left.relations.add(list([TYPE, right, joinClause]))
    else
      list([left, list([TYPE, right, joinClause])], '\n  ')

    join = tag 'Join', relation relationList

Recall that the projectable decorator (applied by the relation decorator) requires a .column method on it's target. Because a join can contain one or more relations, .column must special-case star projections to ensure they apply to all relations in the join.

    join.column = join.c = (colname) ->
      return text('*') if colname is '*'

Furthermore, the column method must support specifying the table as part of the column name, this can either be given as a two element array, or a dot-separated string.

      if Array.isArray(colname)
        [relname, colname] = colname
      else
        [relname, colname] = colname.split('.')

      if colname?
        @table(relname).column(colname)
      else
        right.column(relname)

The .table method used in .column ensures that the table being accessed is actually included in the join.

    join.table = (name) ->
      if left.isJoin # left hand side is a join
        try
          return left.table(name)
        catch e
          null
      else
        if left.name() is name then return left

      if right.name() is name then return right
      else throw new Error("Table #{name} not joined")

We must propagate projections and restrictions from our source relations to the new join object before returning it.

    for prop in ['columns', 'conditions']
      if left[prop] and right[prop]
        join[prop] = left[prop].add(right[prop])
      else if left[prop]
        join[prop] = left[prop]
      else if right[prop]
        join[prop] = right[prop]

Finally, if we are joining a select statement, we return a new select statement referencing the join.

    if left.isSelect or right.isSelect
      select join
    else
      join

Aliases

SQL supports aliasing tables and columns. To that end, the alias constructor takes an arbitrary object and renders it with an alias name. Client code that needs to know if an object is an alias must check for the presence of a name method. We also include a simple decorator that adds an as method to it's argument for easier alias creation.

alias = wrap (thing, name) ->
  as: null
  isAlias: true
  name: -> name
  render: (dialect) -> dialect.render(thing) + ' AS ' + dialect.quote(name)

aliasable = augment (thing) ->
  thing.as = (name) -> alias(thing, name)

Comparable objects

Columns are comparable; they have methods that generate binary comparison nodes for use in restrictions and join conditions. Our decorator augments it's target with these comparison methods.

comparable = augment (obj) ->
  obj[k] = v for k, v of COMPARISONS

The comparison objects themselves are fairly simple, simply delegating to a .compare method that constructs a binary node and tags it as a condition.

COMPARISONS =
  eq:  (other) -> @compare('=',  other)
  ne:  (other) -> @compare('<>', other)
  gt:  (other) -> @compare('>',  other)
  lt:  (other) -> @compare('<',  other)
  lte: (other) -> @compare('<=', other)
  gte: (other) -> @compare('>=', other)
  compare: (op, other) ->
    tag 'Condition', (binary(@, op, maybeParam(other)))

maybeParam = (node) -> if node.render then node else parameter(node)

test 'Comparison rendering', (t) ->
  one = comparable integer(1)
  comp = one.eq(one)
  t.equal(render(comp), '1 = 1', 'Simple comparison')
  t.end()

This tagging is indirectly by the .restrict method via the condition factory in the next section.

Conditions

Conditions are lists of one or more comparisons that define what rows will be allowed in a restriction. They can be combined with a logical AND or OR, and may also be nested to express very complex conditions.

Conditions do not have standalone constructors, instead they are constructed by a condition factory. The condition factory's sole purpose is to hold onto a reference to a source for columns to be used when constructing comparisons.

conditionFactory = (source) -> (cond) ->
  if cond.isCondition
    cond
  else
    conds = makeConditions(cond, source)
    if conds.length is 1
      conds[0]
    else
      tag 'Condition',  list(conds, ' AND ')

makeConditions

The only complexity in conditionFactory is in avoiding redundant work. The heavy lifting is all done by makeConditions which can turn a rather wide variety of input into an array of comparison and list nodes.

makeConditions = (predicate, source) ->
  conditions = []

Our first check is if the predicate is already a condition, in which case we return it immediately as a single item array.

  if predicate.isCondition
    return [predicate]

The next check is if our predicate is an array. We receive arrays of predicates from recursive calls to makeConditions when an object argument contains one of the special keys "and" or "or".

  if Array.isArray(predicate)
    for item in predicate
      if item.isCondition
        conditions.push(item)
      else
        conditions = conditions.concat(makeConditions(item, source))
    return conditions

Finally, if we have gotten this far we can assume that our argument was a condition literal: a literal object that uses column names as it's keys and various constraints for it's values.

  for colname, constraint of predicate
    if colname is 'and'
      conditions.push AND(constraint, source)
    else if colname is 'or'
      conditions.push OR(constraint, source)

Unless, of course, the key is "and" or "or", in which case me make a recursive call and construct a parenthesized list with the result. With that out of the way, we can really assume that keys are column names...

    else
      col = source.column colname
      if constraint is null
        conditions.push col.compare('IS', text('NULL'))
      else if constraint.render? or constraint.constructor isnt Object
        conditions.push col.eq(constraint)
      else
        for op, val of constraint
          conditions.push col[op](val)

  # And finally return the array of conditions
  conditions

That's a fairly complicated chunk of code, so let's break it down a little:

  1. Get a column object from our source.
  2. If our constraint is null, the condition IS NULL
  3. If the constraint has a .render property, or it's not an object at all (i.e. - it's a number or string) create an equality comparison. Note that the call to .eq will create a bound parameter for the constraint if necessary.
  4. Finally, given an object without a .render method, we treat each of it's keys as comparison methods. This is used for comparisons with a literal shorthand (e.g .where(my_column: {gt: 2})).
AND = (conds, source) ->
  tag 'Condition',  parenthesized list(makeConditions(conds, source), ' AND ')

OR = (conds, source) ->
  tag 'Condition',  parenthesized list(makeConditions(conds, source), ' OR ')

Statements

The root of an AST is a normally a statement, which can be one of SELECT, UPDATE, INSERT, or DELETE. Each of these statement types has a constructor function of the same name (delete is a keyword in javascript so our constructor is suffixed with an underscore).

# TODO groupable, orderable, limitable, offsetable
select = wrap (source) ->
  tag 'Select',
    render: memoRoot (dialect) ->
      'SELECT ' + [
        (if source.isProjection then source else source.project('*'))
        prefixed(@conditions, "\nWHERE ")
      ].map(dialect.render).filter(Boolean).join('')
    relations: source.isJoin and source

test "select statements", (t) ->
  posts = table('posts')
  t.ok(select(posts).isSelect)
  t.ok(
    select(posts.where(published: true)).conditions
    "Selects expose .conditions from their source"
  )

  t.ok(
    select(posts).column('published').eq(true)
    "Selects act like relations"
  )
  publishedPosts = select(posts).where(published: true)

  t.equal(
    render(select(posts).where(published: true))
    render(select(posts.where(published: true)))
    "Selects can be restricted rather than their source"
  )

  t.end()

SQL Functions

sqlFunction = (name, args) ->
  args = list(args, ', ')
  return {
    render: (dialect) -> "#{name}(#{dialect.render(args)})"
    params: list.params
  }

Bound parameters

parameter = (it) ->
  render: (dialect) ->
    dialect.placeholder()
  params: -> [it]

Other handy things

prefixed = wrap (thing, prefix) ->
  render: (dialect) ->
    if (out = thing and dialect.render(thing)) then prefix + out else ""

integer = (input) ->
  parsed = parseInt input
  if isNaN parsed
    throw new Error("Invalid integer: #{input}")
  render: -> parsed

limit  = (val) -> prefixed integer(val), 'LIMIT '
offset = (val) -> prefixed integer(val), 'OFFSET '

binary = (left, op, right) -> fixed list([left, text(op), right])

test 'binary nodes', (t) ->
  [left, right] = ['left', 'right'].map(text)
  t.equal(render(binary(left, 'op', right)), 'left op right')
  t.end()

Lists

A list is the most complicated kind of node, as they are an immutable aggregate in a language that has no such concept natively. Lists are also where the biggest performance optimization in Codd is implemented: because our lists of nodes are immutable, we can safely memoize the result of rendering any list.

Digression - memoRoot

However, memoizing the rendered form of every list (and sublist) would lead to a great number of redundant substrings taking up our precious memory. To show how Codd avoids this, we need to understand a further detail of the rendering process:

When a dialect visitor calls the .render method of a node, it pass two arguments: itself, and a context object. This context object has a property, root, that indicates whether this node is the root.

Using this, can implement a "memoRoot" decorator that memoizes the result of a render call only when the node is the root:

memoRoot = (fn) ->
  result = undefined
  return (dialect, ctx) ->
    if ctx?.root
      result ?= fn.call(this, dialect, ctx)
    else
      fn.call(this, dialect, ctx)

Implementation of immutable node lists

Returning to the implementation of lists, we see that the constructor takes 3 arguments: an array of child nodes, a string with which the rendered child nodes will be joined, and an optional parent list, which will be rendered in front of it's immediate children.

list = (nodes=[], glue=' ', parent=null) ->
  if not nodes.length
    console.trace("Empty list created")
    return text('')

First we need to take a shallow copy of nodes so that our caller can't mutate the array underneath us.

  nodes = nodes.slice()

We also create a callback function that will be used for collecting query parameters from child nodes.

  childParams = ->
    nodes.map((n) -> n.params?() ? []).reduce((a, b) -> a.concat(b))

Finally, we construct and return the list object itself. Note how the add method returns a new list.

  return {
    render: (dialect, path) ->
      prefix = (parent and dialect.render(parent) + glue) or ''
      prefix + nodes.map(dialect.render).filter(Boolean).join(glue)

    params:          -> (parent?.params() ? []).concat(childParams())
    add: (nodes...)  -> list(nodes, glue, this)
    slice: (a, b, c) -> nodes.slice(a, b, c)
    map: (fn)        -> nodes.map(fn, this)
  }

Specializations of lists

Tuples and fixed lists are simple wrappers around lists.

parenthesized = augment (list) ->
  original = list.render
  list.render = -> '(' + original.apply(@, arguments) + ')'

fixed = augment (list) -> list.add = null

tuple = (nodes...) -> parenthesized list(nodes, ', ')

Node tagging

tag = (names...) ->
  node = names.pop()
  for name in names
    node['is' + name] = true
  node

defaultDialect implementation

The defaultDialect visitor doesn't do anything too exciting, just initializes some global state to track the current location in the tree, and passes itself to the given nodes render method.

defaultDialect =
  createVisitor: ->
    nodepath = []
    placeholderCount = 1
    return visitor =
      quote: (s) -> '"' + s + '"'
      placeholder: -> "$#{placeholderCount++}"
      render: (node) ->
        nodepath.push(node)
        debugger unless node.render
        out = node.render(visitor, root: !nodepath.length)
        nodepath.pop(node)
        return out

This script generates a MarkDown table of contents from a source file. There are probably (many) better tools for this sort of thing, but I just needed a small step up from my vim macro.

fs = require 'fs'
source = fs.readFileSync(process.argv[2]).toString()
pad = (s, width, padding=' ') ->
  (padding for _ in [s.length...width]).join('') + s

section = 1
console.log source
  .split('\n')
  .filter((ln) -> ln[0] is '#')
  .slice(2)  # Ignore title and contents heading
  .map((ln) ->
    [_, markers, title] = ln.match(/^(#+) (.+)/)
    level = markers.length
    li = if level is 1
      # God help me if I ever have 100 sections...
      pad = (section < 10 and '0') or ''
      pad + (section++) + "."
    else
      markers.replace(/#/g, ' ') + '*'
    "#{li} [#{title}](##{title.replace(/\W/g, '-').toLowerCase()})"
  )
  .join('\n')
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment