Skip to content

Instantly share code, notes, and snippets.

@TLouf
Forked from fredbenenson/kickstarter_sql_style_guide.md
Last active April 20, 2018 04:20
Show Gist options
  • Save TLouf/0119d4d54aa944b50ea7d9377966adfa to your computer and use it in GitHub Desktop.
Save TLouf/0119d4d54aa944b50ea7d9377966adfa to your computer and use it in GitHub Desktop.
HousingAnywhere SQL Style Guide

Purpose

Adhering to the basic rules in this style guide will improve our ability to share, maintain, and extend our research when working with SQL.

This document is written as a manual for anyone working on the Data team, but also as a guide for anyone at the company who would like to write clean and clear code that is meant to be shared.

The individual tips in this guide are based on a composite of knowledge we've gleaned from experience and our roles at previous jobs.

NOTE: This style guide is written for use with Snowflake/Postgres 8.0.2, but much of it can be applied to any SQL database.

Principles

  • We take a practical approach to writing code.
  • We believe consistency in style is very important.
  • We demonstrate intent explicitly in code, via clear structure and comments where needed.

Rules

General stuff

  • Keep tab length at 2 spaces (what's set in Mode). So if you use a code editor, make sure the tab length is set to 2 spaces.

  • No trailing whitespace.

  • Better to capitalize SQL keywords, not mandatory though. In any case, be consistent in your choice to capitalize or not in a single query. Moreover, never use "Format SQL" in Mode to capitalize, it messes with the whole style.

  • Variable names should be underscore separated:

    GOOD: SELECT COUNT(*) AS tenants_count

    NOT GOOD: SELECT COUNT(*) AS tenantsCount

  • Comments should go near the top of your query, or at least near the closest SELECT

  • Try to only comment on things that aren't obvious about the query (e.g., why a particular ID is hardcoded, etc.)

  • Don't use single letter variable names, be as descriptive as possible given the context:

    GOOD: SELECT date_trunc('week', time) AS week

    NOT GOOD: SELECT date_trunc('week', time) AS w

  • (IMPORTANT) Always specify the table of origin when you select from the database and not from a subquery, i.e. use the [table].[column] syntax (in SELECT, JOIN and WHERE statements alike):

SELECT
  listings.id,
  a_random_column_from_a_random_subquery
FROM listings
JOIN a_random_subquery
  ON ...
  • Use WITH statement early and often, and name subqueries well.
  • Inside a subquery, you may add blanklines to improve readability, but only between different statements, and only one blank line. For example:

GOOD:

SELECT 
  id,
  created_at,
  tenant_id
  
FROM listings

NOT GOOD:

SELECT 

  id,

  created_at,
  tenant_id
  
  
FROM listings

SELECT

  • Align all columns to the first column on their own line:
SELECT
  projects.name,
  users.email,
  projects.country,
  COUNT(backings.id) AS backings_count
FROM ...
  • SELECT goes on its own line, possibly followed by a DISTINCT
SELECT [DISTINCT]
  name,
  ...
  • Always rename aggregates and function-wrapped columns:
SELECT
  name,
  SUM(amount) AS sum_amount
FROM ...
  • Always use AS to rename columns:

GOOD:

SELECT
  projects.name AS project_name,
  COUNT(backings.id) AS backings_count
...

NOT GOOD:

SELECT
  projects.name project_name,
  COUNT(backings.id) backings_count
...
  • Long functions should be split across multiple lines: e.g. for a window function: one for the PARTITION, ORDER and frame clauses, each at an additional indentation level. Go to a new line when opening and closing the brackets, and return to the original indentation level when closing it and giving an alias.
SUM(1) OVER (
  PARTITION BY category_id,year            
  ORDER BY pledged DESC
  ROWS UNBOUNDED PRECEDING
) AS category_year

JOIN

  • The ON keyword and condition goes on the line after the JOIN line, with an indent:
SELECT
  projects.name AS project_name,
  COUNT(backings.id) AS backings_count
FROM ksr.projects AS projects
JOIN ksr.backings AS backings 
  ON projects.id = backings.project_id
...
  • Additional filters in the JOIN go on new indented lines:
SELECT
  projects.name AS project_name,
  COUNT(backings.id) AS backings_count
FROM ksr.projects AS projects
INNER JOIN ksr.backings AS backings 
  ON projects.id = backings.project_id
  AND backings.project_country != 'US'
...

WHERE

Multiple WHERE clauses should go on different lines and begin with the SQL operator (just like a JOIN):

SELECT
  name,
  goal
FROM ksr.projects AS projects
WHERE
  country = 'US'
  AND deadline >= '2015-01-01'
...

CASE

Align together:

  • CASE and END
  • WHEN and ELSE
  • THEN if the WHEN... THEN statement is too long and should take two lines for at least one condition
CASE 
  WHEN category = 'Art'
    THEN backer_id
  ELSE NULL
END AS alias

Common Table Expressions (CTEs)

  • WITH clause subqueries are an efficient way of defining tables that can be used throughout the execution of a single query. In all cases, the same results can be achieved by using subqueries in the main body of the SELECT statement, but WITH clause subqueries may be simpler to write and read.

The body of a CTE must be one indent further than the WITH keyword. Open them at the end of a line and close them on a new line:

WITH 

backings_per_category AS (
  SELECT
    category_id,
    deadline,
    ...
)
  • Multiple CTEs should be formatted this way:
    • WITH on the first line and separated by at least one line from the first sub-query
    • Each subquery separated by 2 blank lines
WITH 

backings_per_category AS (
  SELECT
    ...
), 


backers AS (
  SELECT
    ...
)


SELECT * FROM backers;
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment