Adhering to the basic rules in this style guide will improve our ability to share, maintain, and extend our research when working with SQL.
This document is written as a manual for anyone working on the Data team, but also as a guide for anyone at the company who would like to write clean and clear code that is meant to be shared.
The individual tips in this guide are based on a composite of knowledge we've gleaned from experience and our roles at previous jobs.
NOTE: This style guide is written for use with Snowflake/Postgres 8.0.2, but much of it can be applied to any SQL database.
- We take a practical approach to writing code.
- We believe consistency in style is very important.
- We demonstrate intent explicitly in code, via clear structure and comments where needed.
-
Keep tab length at 2 spaces (what's set in Mode). So if you use a code editor, make sure the tab length is set to 2 spaces.
-
No trailing whitespace.
-
Better to capitalize SQL keywords, not mandatory though. In any case, be consistent in your choice to capitalize or not in a single query. Moreover, never use "Format SQL" in Mode to capitalize, it messes with the whole style.
-
Variable names should be underscore separated:
GOOD:
SELECT COUNT(*) AS tenants_count
NOT GOOD:
SELECT COUNT(*) AS tenantsCount
-
Comments should go near the top of your query, or at least near the closest
SELECT
-
Try to only comment on things that aren't obvious about the query (e.g., why a particular ID is hardcoded, etc.)
-
Don't use single letter variable names, be as descriptive as possible given the context:
GOOD:
SELECT date_trunc('week', time) AS week
NOT GOOD:
SELECT date_trunc('week', time) AS w
-
(IMPORTANT) Always specify the table of origin when you select from the database and not from a subquery, i.e. use the [table].[column] syntax (in
SELECT
,JOIN
andWHERE
statements alike):
SELECT
listings.id,
a_random_column_from_a_random_subquery
FROM listings
JOIN a_random_subquery
ON ...
- Use
WITH
statement early and often, and name subqueries well. - Inside a subquery, you may add blanklines to improve readability, but only between different statements, and only one blank line. For example:
GOOD:
SELECT
id,
created_at,
tenant_id
FROM listings
NOT GOOD:
SELECT
id,
created_at,
tenant_id
FROM listings
- Align all columns to the first column on their own line:
SELECT
projects.name,
users.email,
projects.country,
COUNT(backings.id) AS backings_count
FROM ...
SELECT
goes on its own line, possibly followed by aDISTINCT
SELECT [DISTINCT]
name,
...
- Always rename aggregates and function-wrapped columns:
SELECT
name,
SUM(amount) AS sum_amount
FROM ...
- Always use
AS
to rename columns:
GOOD:
SELECT
projects.name AS project_name,
COUNT(backings.id) AS backings_count
...
NOT GOOD:
SELECT
projects.name project_name,
COUNT(backings.id) backings_count
...
- Long functions should be split across multiple lines: e.g. for a window function: one for the
PARTITION
,ORDER
and frame clauses, each at an additional indentation level. Go to a new line when opening and closing the brackets, and return to the original indentation level when closing it and giving an alias.
SUM(1) OVER (
PARTITION BY category_id,year
ORDER BY pledged DESC
ROWS UNBOUNDED PRECEDING
) AS category_year
- The
ON
keyword and condition goes on the line after theJOIN
line, with an indent:
SELECT
projects.name AS project_name,
COUNT(backings.id) AS backings_count
FROM ksr.projects AS projects
JOIN ksr.backings AS backings
ON projects.id = backings.project_id
...
- Additional filters in the
JOIN
go on new indented lines:
SELECT
projects.name AS project_name,
COUNT(backings.id) AS backings_count
FROM ksr.projects AS projects
INNER JOIN ksr.backings AS backings
ON projects.id = backings.project_id
AND backings.project_country != 'US'
...
Multiple WHERE
clauses should go on different lines and begin with the SQL operator (just like a JOIN
):
SELECT
name,
goal
FROM ksr.projects AS projects
WHERE
country = 'US'
AND deadline >= '2015-01-01'
...
Align together:
CASE
andEND
WHEN
andELSE
THEN
if theWHEN... THEN
statement is too long and should take two lines for at least one condition
CASE
WHEN category = 'Art'
THEN backer_id
ELSE NULL
END AS alias
WITH
clause subqueries are an efficient way of defining tables that can be used throughout the execution of a single query. In all cases, the same results can be achieved by using subqueries in the main body of theSELECT
statement, butWITH
clause subqueries may be simpler to write and read.
The body of a CTE must be one indent further than the WITH
keyword. Open them at the end of a line and close them on a new line:
WITH
backings_per_category AS (
SELECT
category_id,
deadline,
...
)
- Multiple CTEs should be formatted this way:
WITH
on the first line and separated by at least one line from the first sub-query- Each subquery separated by 2 blank lines
WITH
backings_per_category AS (
SELECT
...
),
backers AS (
SELECT
...
)
SELECT * FROM backers;