Skip to content

Instantly share code, notes, and snippets.

@yeesian
Last active August 14, 2016 23:34
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save yeesian/0be9e87ef20c1ee5708de468671f9fea to your computer and use it in GitHub Desktop.
Save yeesian/0be9e87ef20c1ee5708de468671f9fea to your computer and use it in GitHub Desktop.
On writing yet-another-SQL-generator

Query Optimization

Document notes on certain design decisions.

Different Dialects

https://en.wikibooks.org/wiki/SQL_Dialects_Reference

Unsupported

We currently do not support

Correlated Subqueries

(explained here) by banning the writing/piping of subqueries into the SELECT or WHERE clauses of the SQL expression.

Recursive CTEs

(CTEs = "Common Table Expressions". explained here). See http://blog.2ndquadrant.com/postgresql-ctes-are-optimization-fences/


Recommendations

We want to provide mechanisms to promote:

Being more familiar with the query-planning and order-of-execution of a SQL statement

see

Subquery flattening

(read: Section 10 of https://www.sqlite.org/optoverview.html)

Query flattening is an important optimization when views are used as each use of a view is translated into a subquery.

When a subquery occurs in the FROM clause of a SELECT, the simplest behavior is to evaluate the subquery into a transient table, then run the outer SELECT against the transient table. But such a plan can be suboptimal since the transient table will not have any indices and the outer query (which is likely a join) will be forced to do a full table scan on the transient table.

To overcome this problem, SQLite attempts to flatten subqueries in the FROM clause of a SELECT. This involves inserting the FROM clause of the subquery into the FROM clause of the outer query and rewriting expressions in the outer query that refer to the result set of the subquery. For example:

  SELECT a FROM (SELECT x+y AS a FROM t1 WHERE z<100) WHERE a>5

Would be rewritten using query flattening as:

  SELECT x+y AS a FROM t1 WHERE z<100 AND a>5

A sufficient condition is to avoid writing:

  1. DISTINCT, LIMIT, OFFSET, ORDER BY
  2. compound select statements
  3. aggregates

in the subquery. For more advanced use cases, there is a long list of conditions that must all be met in order for query flattening to occur:

  1. The subquery and the outer query do not both use aggregates.

  2. The subquery is not an aggregate or the outer query is not a join.

  3. The subquery is not the right operand of a left outer join.

  4. The subquery is not DISTINCT.

  5. The subquery does not use aggregates or the outer query is not DISTINCT.

  6. The subquery has a FROM clause.

  7. The subquery does not use LIMIT or the outer query is not a join.

  8. The subquery does not use LIMIT or the outer query does not use aggregates.

  9. The subquery does not use aggregates or the outer query does not use LIMIT.

  10. The subquery and the outer query do not both have ORDER BY clauses.

  11. The subquery and outer query do not both use LIMIT.

  12. The subquery does not use OFFSET.

  13. The outer query is not part of a compound select or the subquery does not have a LIMIT clause.

  14. The outer query is not an aggregate or the subquery does not contain ORDER BY.

  15. The sub-query is not a compound select, or it is a UNION ALL compound clause made up entirely of non-aggregate queries, and the parent query:

    • is not itself part of a compound select,
    • is not an aggregate or DISTINCT query, and
    • is not a join.

    The parent and sub-query may contain WHERE clauses. Subject to rules (3), (10) and (11), they may also contain ORDER BY, LIMIT and OFFSET clauses.

  16. If the sub-query is a compound select, then all terms of the ORDER by clause of the parent must be simple references to columns of the sub-query.

  17. The subquery does not use LIMIT or the outer query does not have a WHERE clause.

  18. If the sub-query is a compound select, then it must not use an ORDER BY clause.

  19. The subquery does not use LIMIT or the outer query is not DISTINCT.

  20. The subquery is not a recursive CTE.

  21. The parent is not a recursive CTE, or the sub-query is not a compound query.

The casual reader is not expected to understand or remember any part of the list above. The point of this list is to demonstrate that the decision of whether or not to flatten a query is complex.


Style Guide

The generated SQL statements should conform to http://www.sqlstyle.guide/ as far as possible.

  • Maintaining a "river" with keywords on the left
  • Indenting nested subqueries.
  • etc

Syntax

The Pipe source |> verb(...) operates on [left] a Symbol/QueryExpr/DataSource, and [right] a "verb" (see below).

Function names

follow under three main types:

  1. Verb: preceded by a pipe |>, and returns a QueryExpr.
  • select
  • filter
  • groupby
  • orderby
  • limit
  • leftjoin
  • outerjoin
  • innerjoin
  • crossjoin
  1. Adjectives and Prepositions: may occur as a call within a verb in (1). Mostly behaving like keyword arguments.
  • [select] distinct, all
  • [*join] on, using, natural
  • [groupby] having
  • [orderby] desc, asc
  • [limit] offset
  1. SQL Functions: operate on expressions (SQL core and aggregate functions)
  • sqrt, abs, mean, max, min, etc

SQL Translation

  • Generally, a=b will be understood as b AS a (e.g. in the setting of a SELECT clause)
  • each verb will correspond to a single SELECT statement.
  1. source |> select(x, y=f(z), ...) will translate to
SELECT x,
       f(z) AS y,
       ...
  FROM source
  1. source |> filter(condition1, condition2, ...) will translate to
SELECT *
  FROM source
 WHERE condition1
   AND condition2
   AND ...
  1. source |> groupby(column1, column2, ...) will translate to
 SELECT *
   FROM source
GROUPBY column1,
        column2,
        ...
  1. source |> orderby(column1, column2, ...) will translate to
 SELECT *
   FROM source
ORDERBY column1,
        column2,
        ...
  1. source |> limit(n) will translate to
 SELECT *
   FROM source
  LIMIT n
  1. source |> join(table, on(condition)) will translate to
 SELECT *
   FROM source
        JOIN table
          ON condition
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment