Codd is a new SQL query building library that is the spiritual successor to Gesundheit. In Gesundheit, a "query" was a mutable object with methods that modified a (nominally) internal AST for a SQL statement. While this enabled an API that was both familiar and expressive, it had some unfortunate side-effects (pun intended). For example, making a non-destructive modification to an existing query requires a deep copy of it's AST.
Instead, Codd uses a generative approach, where each refinement generates a new, immutable object. The immutability means these objects can safely hold references to eachother and be passed around without unintended side-effects.
The other major difference is that Codd is written as a literate CoffeeScript document: the document you are currently reading!
The first section contains some full example queries, which also function as unit tests. Then, after a short prelude in section 2 to set the stage, sections 3-11 implement the major node types, building up to full statements in section 11. The remaining sections implement the more complicated machinery required by the preceding sections with expansive explanations.
- Simple Nodes
- Relations
- Tables and columns
- Projections
- Restrictions
- Joins
- Aliases
- Comparable objects
- Conditions
test = require('tap').test
test 'Synopsis', (t) ->
The central concept of Codd is (perhaps unsurprisingly) the [relation](http://en.wikipedia.org/wiki/Relation_(database). Though it's common to refer to a database table as a relation, not all relations are tables. In Codd's lexicon, "table" is reserved explicitly for a named relation, while both joins (combination of tables) and restriction (subset of rows from a relation) are derived relations.
people = table('people')
t.test 'Not all relations are tables', (t) ->
t.ok(people.isTable, "Table is a table")
t.ok(people.isRelation, "Table is a relation")
t.ok(not people.where(name: 'Jerry').isTable, "Restriction is not table")
t.end()
Regardless of whether a relation is a table or not, it hax methods for various relational operations, such as column accessors and joins.
posts = table('posts')
personId = people.c('id')
authorId = posts.c('author_id')
peoplePosts = people.innerJoin(posts, authorId.eq(personId))
While JavaScript objects representing relations are all well and good, you more likely want to operate on an actual database, and the lingua franca (for better or worse) is SQL. SQL is statement based, so in order to interact with a database that speaks SQL, we must construct a statement and render it to a string.
t.test "statements and rendering", (t) ->
ppQuery = select(peoplePosts)
sql = render(ppQuery)
t.equal sql, """
SELECT *
FROM "people"
INNER JOIN "posts" ON "posts"."author_id" = "people"."id"
"""
Beyond simply rendering, a statement is also a relation, and can be restricted, joined, and projected.
pppQuery = ppQuery.where('posts.published': true)
t.equal render(pppQuery), """
SELECT *
FROM "people"
INNER JOIN "posts" ON "posts"."author_id" = "people"."id"
WHERE "posts"."published" = $1
""",
'select(...).restrict(...)'
comments = table('comments')
commentPostId = comments.c('post_id')
postId = posts.c('id')
pppAndCommentsQuery = pppQuery
.innerJoin(comments, commentPostId.eq(postId))
t.equal render(pppAndCommentsQuery), """
SELECT *
FROM "people"
INNER JOIN "posts" ON "posts"."author_id" = "people"."id"
INNER JOIN "comments" ON "comments"."post_id" = "posts"."id"
WHERE "posts"."published" = $1""",
'select(...).join(...)'
Of course, we can also derive our joined and restricted relation before creating the select statement.
publishedPeoplePostsAndComments = people
.innerJoin(posts.where(published: true), authorId.eq(personId))
.innerJoin(comments, commentPostId.eq(postId))
t.equal(
render(select(publishedPeoplePostsAndComments)),
render(pppAndCommentsQuery)
)
t.end() # End statements and rendering
The important point is that the results are equivalent whether we join posts
,
people
and comments
before or after creating the select statement.
t.end() # End Synopsis
test 'Restricting relations', (t) ->
myres = table('myrel').restrict(mycol: 2)
t.ok(myres.conditions, "The restriction has a `conditions` property")
t.type(myres.conditions, 'object')
t.equal('"myrel"."mycol" = $1', render(myres.conditions))
t.end()
Every AST node is responsible for rendering itself, but can rely on a dialect visitor object to provide access to pluggable behaviours like SQL identifier quoting.
The rendering process is initiated by calling the render
function, which
creates a visitor and uses it to render a root node:
render = (root, dialect=defaultDialect) ->
dialect.createVisitor().render(root)
(The implementation of defaultDialect is shown later in this file).
A great deal of functionality is implemented by augmenting objects using simple
decorators. The augmenting decorators themselves are created with an augment
decorator that wraps any function so that it always returns it's first argument.
augment = (fn) -> (first) ->
fn.apply(@, arguments)
return first
Another common pattern is wrapping an object with a facade object that has it's
__proto__
set to the wrapped object. We define the wrap
decorator to handle
the mechanics of this pattern.
wrap = (fn) -> (first) ->
result = fn.apply(@, arguments)
result.__proto__ = first
return result
The most basic node, text
, completely ignore the dialect when rendering,
and simply returns it's value directly.
text = (input) ->
if typeof input isnt 'string'
throw new Error("Input is not a string: #{input}")
render: -> input
An identifier
represents a table or column name, and relies on the dialect to
properly quote itself:
identifier = (input) ->
text input
render: (dialect) -> dialect.quote(input)
A relation is defined by the derivations that can be performed on it. Methods for joining other relations, restricting a relation to a subset of rows, and projecting a subset of columns are all added to a source object with corresponding augmentation decorators.
relation = (source) ->
tag 'Relation', aliasable joinable restrictable projectable source
The decorators above are implemented later on in this document, but here is an list of the operations they add (not all join methods are listed).
test 'relation operations', (t) ->
rel = relation({})
t.ok(rel.isRelation, "relations are tagged 'isRelation'")
t.type(rel[meth], 'function', "Relations can '#{meth}'") for meth in [
'as'
'innerJoin'
'restrict'
'where'
'project'
]
t.end()
A table is an identifer that has been decorated as a relation and augmented with methods for accessing the table name and creating columns that refer back to it.
table = (name) ->
tbl = relation identifier name
tbl.name = -> name
tbl.c = tbl.column = (name) -> column this, name
tag 'Table', tbl
Notice that there is no way to modify the internal state (the name
variable),
after the table has been created. This pattern will repeat throughout Codd.
A column is a comparable, aliasable identifier.
column = (source, name) ->
col = comparable aliasable identifier name
col.name = -> name
col.render = (dialect) ->
relName = dialect.quote(source.name())
colName = if name is '*' then '*' else dialect.quote(name)
relName + '.' + colName
return col
The render method of columns is customized to be aware of aliasing so that columns sourced from aliased relations are rendered correctly.
A projection is a list of columns derived from a "source" such as a table or join.
Here we see our first decorator definition, projectable, that adds a .project
method to it's target. Recall that decorators return
their
first argument.
projectable = augment (source) ->
source.project = (names...) -> projection(@, names)
The projection
constructor itself creates an (internal) column list by
calling source.column
, later we'll see how this can be used for projecting
from joins.
FROM = text("\nFROM ")
projection = wrap (source, names, parent_cols=null) ->
cols = list((source.column(name) for name in names), ', ', parent_cols)
p = tag 'Projection', restrictable list([cols, FROM, source], '')
p.columns = cols
p.column = p.c = (name) ->
colnames = []
for col in @columns.map((c) -> c)
colname = col.name()
return col if colname is name
colnames.push colname
throw new Error("Column #{name} is not present in #{colnames}")
return p
Restrictions, (better known as WHERE
clauses) are an essential part of
working with relational data. Relations, projections, and restrictions
themselves are all "restrictable", which means that we can construct a new
restriction using them as a source.
restrictable = augment (source) ->
source.restrict = source.where = (conditions...) ->
restriction @, conditions
(we also alias .where
for familiarity)
The restriction
constructor coerces the conditions input into proper nodes
using a conditionFactory, then returns a new wrapper holding
the condition list.
restriction = wrap (source, conditions) ->
conditions = conditions.map(conditionFactory(source))
return {
isTable: false
params: -> @conditions.params() ? []
conditions: list(conditions, ' AND ', source.conditions)
}
A join is a list of relations that can (for the most part) be treated like a single wide table. E.g. you can access/project columns from any/all of the relations it contains.
First off, we define a joinable decorator that adds multiple join constructor methods to a source object. (This pattern is probably becoming familiar ;)
joinable = augment (source) ->
source[name] = method for name, method of JOINS
The join methods themselves are quite complicated, and the rest of this section explains their implementation in depth. We start with a list of the joins we want to create methods for.
JOINS = {}
[
'INNER JOIN'
'LEFT JOIN'
'RIGHT JOIN'
'NATURAL JOIN'
'CROSS JOIN'
'LEFT OUTER JOIN'
'RIGHT OUTER JOIN'
'FULL OUTER JOIN'
]
.forEach (type) ->
The method names are camelCased versions of the join text, so 'LEFT OUTER JOIN'
becomes leftOuterJoin
.
methodName = type
.toLowerCase()
.replace(/\s[a-z]/g, (t) -> t[1].toUpperCase())
The type itself is wrapped in a text node, we also need an ON
node for when
join conditions are specified.
TYPE = text(type)
ON = text('ON')
NO_CLAUSE = text('')
Each method starts by turning the given join conditions into a (possibly empty) array.
JOINS[methodName] = (right, conditions...) ->
joinClause = if conditions.length > 1
# conditions.map(conditionFactory(@))
list [ON, list(conditions, ' AND ')]
else if conditions.length
list [ON, conditions[0]]
else
NO_CLAUSE
Joins contain a list of source relations (and their join conditions). Because any relation (including other joins) can be either the left or right hand side for this join, we must handle multiple disparate casea when merging the relation lists.
Having a join on the right requires the most work, we must take a mutable copy of it's relation list and remove the first element (a table name).
left = this
relationList = if right.relations
rr = right.relations.slice()
rrTable = rr.shift()
Once we have our rr
and rrTable
nodes, we must combine them with the left
hand relation, which may also have a list of source relations.
if left.relations
left.relations.add(list([TYPE, rrTable, joinClause]), rr...)
else
rr.unshift(left, list([TYPE, rrTable, joinClause]))
list(rr)
If the right hand side is a table and not a join, combining it with the left is simply concatenation of nodes.
else if left.relations
left.relations.add(list([TYPE, right, joinClause]))
else
list([left, list([TYPE, right, joinClause])], '\n ')
join = tag 'Join', relation relationList
Recall that the projectable decorator (applied by the relation decorator) requires a .column
method on it's target.
Because a join can contain one or more relations, .column
must special-case
star projections to ensure they apply to all relations in the join.
join.column = join.c = (colname) ->
return text('*') if colname is '*'
Furthermore, the column method must support specifying the table as part of the column name, this can either be given as a two element array, or a dot-separated string.
if Array.isArray(colname)
[relname, colname] = colname
else
[relname, colname] = colname.split('.')
if colname?
@table(relname).column(colname)
else
right.column(relname)
The .table
method used in .column
ensures that the table being
accessed is actually included in the join.
join.table = (name) ->
if left.isJoin # left hand side is a join
try
return left.table(name)
catch e
null
else
if left.name() is name then return left
if right.name() is name then return right
else throw new Error("Table #{name} not joined")
We must propagate projections and restrictions from our source relations to the new join object before returning it.
for prop in ['columns', 'conditions']
if left[prop] and right[prop]
join[prop] = left[prop].add(right[prop])
else if left[prop]
join[prop] = left[prop]
else if right[prop]
join[prop] = right[prop]
Finally, if we are joining a select statement, we return a new select statement referencing the join.
if left.isSelect or right.isSelect
select join
else
join
SQL supports aliasing tables and columns. To that end, the alias
constructor
takes an arbitrary object and renders it with an alias name. Client code that
needs to know if an object is an alias must check for the presence of a name
method. We also include a simple decorator that adds an as
method to it's
argument for easier alias creation.
alias = wrap (thing, name) ->
as: null
isAlias: true
name: -> name
render: (dialect) -> dialect.render(thing) + ' AS ' + dialect.quote(name)
aliasable = augment (thing) ->
thing.as = (name) -> alias(thing, name)
Columns are comparable; they have methods that generate binary comparison nodes for use in restrictions and join conditions. Our decorator augments it's target with these comparison methods.
comparable = augment (obj) ->
obj[k] = v for k, v of COMPARISONS
The comparison objects themselves are fairly simple, simply delegating to a
.compare
method that constructs a binary node and tags it as a condition.
COMPARISONS =
eq: (other) -> @compare('=', other)
ne: (other) -> @compare('<>', other)
gt: (other) -> @compare('>', other)
lt: (other) -> @compare('<', other)
lte: (other) -> @compare('<=', other)
gte: (other) -> @compare('>=', other)
compare: (op, other) ->
tag 'Condition', (binary(@, op, maybeParam(other)))
maybeParam = (node) -> if node.render then node else parameter(node)
test 'Comparison rendering', (t) ->
one = comparable integer(1)
comp = one.eq(one)
t.equal(render(comp), '1 = 1', 'Simple comparison')
t.end()
This tagging is indirectly by the .restrict method via the condition factory in the next section.
Conditions are lists of one or more comparisons that define what rows will be
allowed in a restriction. They can be combined with a logical AND
or OR
, and
may also be nested to express very complex conditions.
Conditions do not have standalone constructors, instead they are constructed by a condition factory. The condition factory's sole purpose is to hold onto a reference to a source for columns to be used when constructing comparisons.
conditionFactory = (source) -> (cond) ->
if cond.isCondition
cond
else
conds = makeConditions(cond, source)
if conds.length is 1
conds[0]
else
tag 'Condition', list(conds, ' AND ')
The only complexity in conditionFactory
is in avoiding redundant work. The
heavy lifting is all done by makeConditions
which can turn a rather wide
variety of input into an array of comparison and list nodes.
makeConditions = (predicate, source) ->
conditions = []
Our first check is if the predicate is already a condition, in which case we return it immediately as a single item array.
if predicate.isCondition
return [predicate]
The next check is if our predicate is an array. We receive arrays of predicates
from recursive calls to makeConditions
when an object argument contains one of
the special keys "and" or "or".
if Array.isArray(predicate)
for item in predicate
if item.isCondition
conditions.push(item)
else
conditions = conditions.concat(makeConditions(item, source))
return conditions
Finally, if we have gotten this far we can assume that our argument was a condition literal: a literal object that uses column names as it's keys and various constraints for it's values.
for colname, constraint of predicate
if colname is 'and'
conditions.push AND(constraint, source)
else if colname is 'or'
conditions.push OR(constraint, source)
Unless, of course, the key is "and" or "or", in which case me make a recursive call and construct a parenthesized list with the result. With that out of the way, we can really assume that keys are column names...
else
col = source.column colname
if constraint is null
conditions.push col.compare('IS', text('NULL'))
else if constraint.render? or constraint.constructor isnt Object
conditions.push col.eq(constraint)
else
for op, val of constraint
conditions.push col[op](val)
# And finally return the array of conditions
conditions
That's a fairly complicated chunk of code, so let's break it down a little:
- Get a column object from our source.
- If our constraint is null, the condition
IS NULL
- If the constraint has a
.render
property, or it's not an object at all (i.e. - it's a number or string) create an equality comparison. Note that the call to.eq
will create a bound parameter for the constraint if necessary. - Finally, given an object without a
.render
method, we treat each of it's keys as comparison methods. This is used for comparisons with a literal shorthand (e.g.where(my_column: {gt: 2})
).
AND = (conds, source) ->
tag 'Condition', parenthesized list(makeConditions(conds, source), ' AND ')
OR = (conds, source) ->
tag 'Condition', parenthesized list(makeConditions(conds, source), ' OR ')
The root of an AST is a normally a statement, which can be one of SELECT
,
UPDATE
, INSERT
, or DELETE
. Each of these statement types has a
constructor function of the same name (delete
is a keyword in javascript so
our constructor is suffixed with an underscore).
# TODO groupable, orderable, limitable, offsetable
select = wrap (source) ->
tag 'Select',
render: memoRoot (dialect) ->
'SELECT ' + [
(if source.isProjection then source else source.project('*'))
prefixed(@conditions, "\nWHERE ")
].map(dialect.render).filter(Boolean).join('')
relations: source.isJoin and source
test "select statements", (t) ->
posts = table('posts')
t.ok(select(posts).isSelect)
t.ok(
select(posts.where(published: true)).conditions
"Selects expose .conditions from their source"
)
t.ok(
select(posts).column('published').eq(true)
"Selects act like relations"
)
publishedPosts = select(posts).where(published: true)
t.equal(
render(select(posts).where(published: true))
render(select(posts.where(published: true)))
"Selects can be restricted rather than their source"
)
t.end()
sqlFunction = (name, args) ->
args = list(args, ', ')
return {
render: (dialect) -> "#{name}(#{dialect.render(args)})"
params: list.params
}
parameter = (it) ->
render: (dialect) ->
dialect.placeholder()
params: -> [it]
prefixed = wrap (thing, prefix) ->
render: (dialect) ->
if (out = thing and dialect.render(thing)) then prefix + out else ""
integer = (input) ->
parsed = parseInt input
if isNaN parsed
throw new Error("Invalid integer: #{input}")
render: -> parsed
limit = (val) -> prefixed integer(val), 'LIMIT '
offset = (val) -> prefixed integer(val), 'OFFSET '
binary = (left, op, right) -> fixed list([left, text(op), right])
test 'binary nodes', (t) ->
[left, right] = ['left', 'right'].map(text)
t.equal(render(binary(left, 'op', right)), 'left op right')
t.end()
A list
is the most complicated kind of node, as they are an immutable
aggregate in a language that has no such concept natively. Lists are also where
the biggest performance optimization in Codd is implemented: because our lists
of nodes are immutable, we can safely memoize the result of rendering any list.
However, memoizing the rendered form of every list (and sublist) would lead to a great number of redundant substrings taking up our precious memory. To show how Codd avoids this, we need to understand a further detail of the rendering process:
When a dialect visitor calls the
.render
method of a node, it pass two arguments: itself, and acontext
object. This context object has a property,root
, that indicates whether this node is the root.
Using this, can implement a "memoRoot" decorator that memoizes the result of a render call only when the node is the root:
memoRoot = (fn) ->
result = undefined
return (dialect, ctx) ->
if ctx?.root
result ?= fn.call(this, dialect, ctx)
else
fn.call(this, dialect, ctx)
Returning to the implementation of lists, we see that the constructor takes 3 arguments: an array of child nodes, a string with which the rendered child nodes will be joined, and an optional parent list, which will be rendered in front of it's immediate children.
list = (nodes=[], glue=' ', parent=null) ->
if not nodes.length
console.trace("Empty list created")
return text('')
First we need to take a shallow copy of nodes
so that our caller can't mutate
the array underneath us.
nodes = nodes.slice()
We also create a callback function that will be used for collecting query parameters from child nodes.
childParams = ->
nodes.map((n) -> n.params?() ? []).reduce((a, b) -> a.concat(b))
Finally, we construct and return the list object itself. Note how the add
method returns a new list.
return {
render: (dialect, path) ->
prefix = (parent and dialect.render(parent) + glue) or ''
prefix + nodes.map(dialect.render).filter(Boolean).join(glue)
params: -> (parent?.params() ? []).concat(childParams())
add: (nodes...) -> list(nodes, glue, this)
slice: (a, b, c) -> nodes.slice(a, b, c)
map: (fn) -> nodes.map(fn, this)
}
Tuples and fixed lists are simple wrappers around lists.
parenthesized = augment (list) ->
original = list.render
list.render = -> '(' + original.apply(@, arguments) + ')'
fixed = augment (list) -> list.add = null
tuple = (nodes...) -> parenthesized list(nodes, ', ')
tag = (names...) ->
node = names.pop()
for name in names
node['is' + name] = true
node
The defaultDialect visitor doesn't do anything too exciting, just initializes
some global state to track the current location in the tree, and passes itself
to the given nodes render
method.
defaultDialect =
createVisitor: ->
nodepath = []
placeholderCount = 1
return visitor =
quote: (s) -> '"' + s + '"'
placeholder: -> "$#{placeholderCount++}"
render: (node) ->
nodepath.push(node)
debugger unless node.render
out = node.render(visitor, root: !nodepath.length)
nodepath.pop(node)
return out