Skip to content

Instantly share code, notes, and snippets.

@mattmc3
Last active April 8, 2024 22:53
Show Gist options
  • Save mattmc3/38a85e6a4ca1093816c08d4815fbebfb to your computer and use it in GitHub Desktop.
Save mattmc3/38a85e6a4ca1093816c08d4815fbebfb to your computer and use it in GitHub Desktop.
Modern SQL Style Guide
layout author title revision version description
default
mattmc3
Modern SQL Style Guide
2019-01-17
1.0.1
A guide to writing clean, clear, and consistent SQL.

Modern SQL Style Guide

select *
  from modern.sql_style_guide as guide
 where guide.attributes in ('clean', 'clear', 'consistent')
   and guide.look = 'beautiful'

Purpose

These guidelines are designed to make SQL statements easy to write, easy to read, easy to maintain, and beautiful to see. This document is to be used as a guide for anyone who would like to codify a team's preferred SQL style.

This guide is opinionated in some areas and relaxed in others. You can use this set of guidelines, fork them, or make your own - the key here is that you pick a style and stick to it. The odds of making everyone happy are low, so compromise is a guiding principle to achieve cohesion.

It is easy to include this guide in Markdown format as a part of a project's code base or reference it here for anyone on the project to freely read.

This guide is based on various existing attempts at SQL standards including: http://www.sqlstyle.guide and Kickstarter guide. Due to its origins, it is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

The example SQL statements used are based on tables in the AdventureWorks database. Note that due to the use of the existing AdventureWorks schema, some of the guidelines in this document are not always followed, especially with regards to naming conventions. Those discrepancies will be called out as they appear.

NOTE: This style guide is written for use with Microsoft SQL Server, but much of it can be applied to any SQL database with some simple modifications.

Principles

  • We take a disciplined and practical approach to writing code.
  • We treat SQL like any other source code, which should be checked into source control, peer reviewed, and properly maintained.
  • We believe consistency in style is important, and we value craftsmanship, but not to the exclusion of other practical concerns.
  • We demonstrate intent explicitly in code, via clear structure and comments where needed.
  • We adhere to a consistent style for handwritten SQL so that our code can thrive in an environment with many authors, editors, and readers.

Quick look

Before getting into all the specifics, here is a quick look at some examples showing well formatted, beautiful SQL that matches the recommendations in this style guide:

-- basic select example
select p.Name as ProductName
     , p.ProductNumber
     , pm.Name as ProductModelName
     , p.Color
     , p.ListPrice
  from Production.Product as p
  join Production.ProductModel as pm
    on p.ProductModelID = pm.ProductModelID
 where p.Color in ('Blue', 'Red')
   and p.ListPrice < 800.00
   and pm.Name like '%frame%'
 order by p.Name
-- basic insert example
insert into Sales.Currency (CurrencyCode, Name, ModifiedDate)
values ('XBT', 'Bitcoin', getutcdate())
     , ('ETH', 'Ethereum', getutcdate())
-- basic update example
update p
   set p.ListPrice = p.ListPrice * 1.05
     , p.ModifiedDate = getutcdate()
  from Production.Product as p
 where p.SellEndDate is null
   and p.SellStartDate is not null
-- basic delete example
delete cc
  from Sales.CreditCard as cc
 where cc.ExpYear < '2003'
   and cc.ModifiedDate < dateadd(year, -1, getutcdate())

Rules

General guidance

  • Favor using a "river" for vertical alignment so that a query can be quickly and easily be scanned by a new reader.

  • Comments should appear at the top of your query or script, and should explain the intent of the query, not the mechanics.

  • Try to comment things that aren't obvious about the query (e.g., why a particular filter is necessary, why an optimization trick was needed, etc.)

  • Favor being descriptive over terseness:

    GOOD: select emp.LoginID as EmployeeUserName

    BAD: select emp.LoginID as EmpUsrNm

  • Follow any existing style in the script before applying this style guide. The SQL script should have one clear style, and these rules should not be applied to existing scripts unless the whole script is being changed to adhere to the same style.

  • Favor storing datetime and datetime2 in UTC unless embedding timezone information (datetimeoffset) so that times are clear and convertible. Use ISO-8601 compliant time and date information (YYYY-MM-DD HH:MM:SS.SSSSS) when referring to date/time data.

Casing

Do not SHOUTCASE or "Sentence case" SQL keywords (e.g., prefer select, not SELECT or Select). SHOUTCASED SQL is an anachronism, and is not appropriate for modern SQL development. Using lowercase keywords is preferred because:

  • UPPERCASE words are harder to type and harder to read.
  • SQL syntax is not case-sensitive, and thus lowercase keywords work correctly in all variants of SQL
  • No other modern languages use ALLCAPS keywords.
  • Modern editors color code SQL keywords, so there is not a need to distinguish keywords by casing.
  • If you are in an environment where your keywords are not colored (i.e. as a string in another language), using a river for formatting provides a similar benefit of highlighting important keywords without resorting to CAPS.
  • UPPERCASE IS ASSOCIATED WITH SHOUTING WHEN SEEN IN TEXT, IS HARD TO READ, AND MAKES SQL FEEL MORE LIKE COBOL THAN A MODERN LANGUAGE.

If the SQL script you are editing already uses SHOUTCASE keywords, match that style or change all keywords to lowercase. Favor bending the rules for the sake of consistency rather than mixing styles.

Naming guidance

  • Names should be underscore_separated or PascalCase but do not mix styles.

    GOOD: select count(*) as the_tally, sum(*) as the_total ...

    BAD: select count(*) as The_Tally, sum(*) as theTotal ...

Tables

  • Do not use reserved words for table names if possible.

  • Prefer the shortest commonly understood words to name a table.

  • Naming a table as a plural makes the table easier to speak about. (e.g. favor employees over employee)

  • Do not use object prefixes or Hungarian notation (e.g. sp_, prc_, vw_, tbl_, t_, fn_, etc).

  • Tables with semantic prefixes are okay if they aid understanding the nature of a table (e.g. in a Data Warehouse where it is common to use prefixes like Dim and Fact).

  • Avoid giving a table the same name as one of its columns.

  • Use a joining word for many-to-many joining tables (cross references) rather than concatenating table names (e.g. Xref):

    GOOD: drivers_xref_cars

    BAD: drivers_cars

  • Tables should always have a primary key. A single column, auto-number (identity) surrogate key is preferable.

  • Natural keys or composite keys can be enforced with unique constraints in lieu of making them a primary key.

  • Composite keys make for verbose and slow foreign key joins. int/bigint primary keys are optimal as foreign keys when a table gets large.

  • Tables should always have created_at and updated_at metadata fields in them to make data movement between systems easier (ETL). Also, consider storing deleted records in archival tables, or having a deleted_at field for soft deletes.

  • Don't forget the needs of data analysts and ETL developers when designing your model.

Columns

  • Do not use reserved words for column names if possible.
  • Prefer not simply using id as the name of the primary identifier for the table if possible.
  • Do not add a column with the same name as its table and vice versa.
  • Avoid common words like Name, Description, etc. Prefer a descriptive prefix for those words so that they don't require aliases when joined to other tables with similarly named columns. (NOTE: This guide uses the AdventureWorks database, which commonly has columns named Name against this guide's advice. Remember that an existing convention may be in place that is beyond your control. )
  • Do not use Desc as an abbreviation for Description. Spell it out, or use some other non-keyword.

Aliases

  • Aliases should relate in some way to the object or expression they are aliasing.
  • As a rule of thumb the alias can be the first letter of each word in the object's name or a good abbreviation.
  • If there is already an alias with the same name then append a number.
  • When using a subquery, prefix aliases with an _ to differentiate them from aliases in the outer query.
  • Always include the as keyword. It makes the query easier to read and is explicit.
  • For computed data (i.e. sum() or avg()) use the name you would give it were it a column defined in the schema.

Whitespace

  • No tabs. Use spaces for indents.
  • Configure your editor to 4 spaces per indent, but prefer your SQL to indent to the "river", and not to a set indent increment.
  • No trailing whitespace.
  • No more than two blank lines between statements.
  • No empty lines in the middle of a single statement.
  • One final newline at the end of a file
  • Use an .editorConfig file to enforce reasonable whitespace rules if your SQL editor supports it:
# .editorConfig is awesome: https://EditorConfig.org

# SQL files
[*.{sql,tsql,ddl}]
charset = utf-8
indent_style = space
indent_size = 4
end_of_line = crlf
trim_trailing_whitespace = true
insert_final_newline = true

River formatting

Spaces may be used to line up the code so that the root keywords all end on the same character boundary. This forms a "river" down the middle making it easy for the reader's eye to scan over the code and separate the keywords from the implementation detail. Rivers are bad in typography, but helpful here. Celko's book describes using a river to vertically align your query. Right align keywords to the river if you chose to use one. The on clause in the from may have its own river to help align information vertically. Subqueries should create their own river as well.

-- a river in the 7th column helps vertical readability
select prdct.Name as ProductName
     , prdct.ListPrice
     , prdct.Color
     , cat.Name as CategoryName
     , subcat.Name as SubcategoryName
  from Production.Product as prdct
  left join Production.ProductSubcategory as subcat
    on prdct.ProductSubcategoryID = subcat.ProductSubcategoryID
  left join Production.ProductCategory as cat
    on subcat.ProductCategoryID = cat.ProductCategoryID
 where prdct.ListPrice <= 1000.00
   and prdct.ProductID not in (
           select _pd.ProductID
             from Production.ProductDocument _pd
            where _pd.ModifiedDate < dateadd(year, -1, getutcdate())
       )
   and prdct.Color in ('Black', 'Red', 'Silver')
 order by prdct.ListPrice desc, prdct.Name
-- alternately, a river in the a different column is fine if that is preferred
-- due to longer keywords, but know that indenting can feel "off" if the
-- `select` is not in the first column for the query
   select prdct.Name as ProductName
        , prdct.ListPrice
        , prdct.Color
        , cat.Name as CategoryName
        , subcat.Name as SubcategoryName
     from Production.Product as prdct
left join Production.ProductSubcategory as subcat
       on prdct.ProductSubcategoryID = subcat.ProductSubcategoryID
left join Production.ProductCategory as cat
       on subcat.ProductCategoryID = cat.ProductCategoryID
    where prdct.ListPrice <= 1000.00
      and prdct.ProductID not in (
              select _pd.ProductID
                from Production.ProductDocument _pd
               where _pd.ModifiedDate < dateadd(year, -1, getutcdate())
          )
      and prdct.Color in ('Black', 'Red', 'Silver')
 order by prdct.ListPrice desc, prdct.Name

Indent formatting

Using a river can be tedious, so if this alignment is not preferred by your team, then a standard 4 space indent can be used in place of a river.

Major keywords starting a clause should occupying their own line. Major keywords are:

  • Select statement
    • select
    • into
    • from
    • where
    • group by
    • having
    • order by
  • Insert statement additions
    • insert into
    • values
  • Update statement additions
    • update
    • set
  • Delete statement additions
    • delete

All other keywords are minor and should appear after the indent and not occupy a line to themselves. Other than this section, this guide will stick to showing "river" formatting examples.

-- Editors tend to handle indenting style better than river alignment. River
-- formatting has advantages over indent formatting, but this style is
-- acceptable.
select
    prdct.Name as ProductName
    ,prdct.ListPrice
    ,prdct.Color
    ,cat.Name as CategoryName
    ,subcat.Name as SubcategoryName
from
    Production.Product as prdct
    left join Production.ProductSubcategory as subcat
        on prdct.ProductSubcategoryID = subcat.ProductSubcategoryID
    left join Production.ProductCategory as cat
        on subcat.ProductCategoryID = cat.ProductCategoryID
where
    prdct.ListPrice <= 1000.00
    and prdct.Color in ('Black', 'Red', 'Silver')
order by
    prdct.ListPrice desc, prdct.Name

select clause

Select the first column on the same line, and align all subsequent columns after the first get their own line.

select prdct.Color
     , cat.Name as CategoryName
     , count(*) as ProductCount
  from ...

If three or fewer columns are selected, have short names, and don't need aliased, you may chose to have them occupy the same line for brevity.

-- shortcut for small columns
select p.Color, c.Name, p.ListPrice
  from ...

If using a select modifier like distinct or top, put the first column on its own line.

-- treat the first column differently when using distinct and top
select distinct
       p.Color
     , c.Name as CategoryName
  from ...

Use commas as a prefix as opposed to a suffix. This is preferred because:

  • It makes it easy to add new columns to the end of the column list, which is more common than at the beginning
  • It prevents unintentional aliasing bugs (missing comma)
  • It makes commenting out columns at the end easier
  • When statements take multiple lines like windowing functions and case statements, the prefix comma makes it clear when a new column starts
  • It does not adversely affect readability

The comma should border the "river" on the keyword side.

GOOD:

select Name
     , ListPrice
     , Color
     , CategoryName
   ...

BAD:

-- whoops! forgot a trailing comma because it's hard to see, making an
-- accidental alias of `ListPrice Color`
select Name,
       ListPrice
       Color,
       CategoryName
   ...

Always use as to rename columns. as statements can be used for additional vertical alignment but don't have to be:

GOOD:

select prdct.Color as ProductColor
     , cat.Name    as CategoryName
     , count(*)    as ProductCount
  from ...
...

BAD:

select prdct.Color ProductColor
     , cat.Name CategoryName
     , count(*) ProductCount
  from ...
...

Always rename aggregates, derived columns (e.g. case statements), and function-wrapped columns:

select ProductName
     , sum(UnitPrice * OrderQty) as TotalCost
     , getutcdate() as NowUTC
  from ...

Always use table alias prefixes for all columns when querying from more than one table. Single character aliases are fine for a few tables, but are less likely to be clear as a query grows:

select prdct.Color
     , subcat.Name as SubcategoryName
     , count(*) as ItemCount
  from Production.Product as prdct
  left join Production.ProductSubcategory as subcat
    on ...

Do not bracket-escape table or column names unless the names contain keyword collisions or would cause a syntax error without properly qualifying them.

GOOD:

-- owner and status are keywords
select Title
     , [Owner]
     , [Status]
from Production.Document

BAD:

-- extra brackets are messy and unnecessary
select [Title]
     , [Owner]
     , [Status]
from [Production].[Document]

Windowing functions

Long Window functions should be split across multiple lines: one for each clause, aligned with a river. Partition keys can share the same line, or be split. Ascending order is an intuitive default and thus using an explicit asc is not necessary whereas desc is. All window functions should be aliased.

select p.ProductID
     , p.Name as ProductName
     , p.ProductNumber
     , p.ProductLine
     , row_number() over (partition by p.ProductLine
                                     , left(p.ProductNumber, 2)
                              order by right(p.ProductNumber, 4) desc) as SequenceNum
     , p.Color
  from Production.Product p
 order by p.ProductLine
     , left(p.ProductNumber, 2)
     , SequenceNum

case statements

case statements aren't always easy to format but try to align when, then, and else together inside case and end.

then can stay on the when line if needed, but aligning with else is preferable.

select dep.Name as DepartmentName
     , case when dep.Name in ('Engineering', 'Tool Design', 'Information Services')
            then 'Information Technology'
            else dep.GroupName
       end as NewGroupName
  from HumanResources.Department as dep
 order by NewGroupName, DepartmentName

from clause

Only one table should be in the from part. Never use comma separated from-joins:

GOOD:

select cust.AccountNumber
     , sto.Name as StoreName
  from Sales.Customer as cust
  join Sales.Store as sto
    on cust.StoreID = sto.BusinessEntityID
...

BAD:

select cust.AccountNumber
     , sto.Name as StoreName
  from Sales.Customer as cust, Sales.Store as sto
 where cust.StoreID = sto.BusinessEntityID
...

Favor not using the extraneous words inner or outer when joining tables. Alignment is easier without them, they don't add to the understanding of the query, and the full table list is easier to scan without excessive staggering:

GOOD:

-- this is easier to format and read
   select *
     from HumanResources.Employee as emp
     join Person.Person as per
       on emp.BusinessEntityID = per.BusinessEntityID
left join HumanResources.EmployeeDepartmentHistory as edh
       on emp.BusinessEntityID = edh.BusinessEntityID
left join HumanResources.Department as dep
       on edh.DepartmentID = dep.DepartmentID

BAD:

-- verbosity for the sake of verbosity is not helpful
-- `join` by itself always means `inner join`
-- `outer` is an unnecessary optional keyword
         select *
           from HumanResources.Employee as emp
     inner join Person.Person as per
             on emp.BusinessEntityID = per.BusinessEntityID
left outer join HumanResources.EmployeeDepartmentHistory as edh
             on emp.BusinessEntityID = edh.BusinessEntityID
left outer join HumanResources.Department as dep
             on edh.DepartmentID = dep.DepartmentID

The on keyword and condition can go on its own line, but is easier to scan if it lines up on the join line. This is an acceptable style alternative:

-- this is an easier format to scan visually, but comes at the cost of longer
-- lines of code.
   select *
     from HumanResources.Employee as emp
     join Person.Person as per                            on emp.BusinessEntityID = per.BusinessEntityID
left join HumanResources.EmployeeDepartmentHistory as edh on emp.BusinessEntityID = edh.BusinessEntityID
left join HumanResources.Department as dep                on edh.DepartmentID = dep.DepartmentID
...

Additional filters in the join go on new indented lines. Line up using the on keyword:

GOOD:

   select emp.JobTitle
     from HumanResources.Employee as emp
left join HumanResources.EmployeeDepartmentHistory as edh
       on emp.BusinessEntityID = edh.BusinessEntityID
left join HumanResources.Department as dep
       on edh.DepartmentID = dep.DepartmentID
      and dep.Name <> dep.GroupName  -- multi-conditions start a new line
    where dep.DepartmentID is null

BAD:

   select emp.JobTitle
     from HumanResources.Employee as emp
left join HumanResources.EmployeeDepartmentHistory as edh
       on emp.BusinessEntityID = edh.BusinessEntityID
left join HumanResources.Department as dep
       on edh.DepartmentID = dep.DepartmentID and dep.Name <> dep.GroupName  -- needs a new line
    where dep.DepartmentID is null

Begin with inner joins and then list left joins, order them semantically, and do not intermingle left joins with inner joins unless necessary. Order the on clause with joining aliases referencing tables top-to-bottom:

GOOD:

select *
  from Production.Product as prd
  join Production.ProductModel as prm
    on prd.ProductModelID = prm.ProductModelID
  left join Production.ProductSubcategory as psc
    on prd.ProductSubcategoryID = psc.ProductSubcategoryID
  left join Production.ProductDocument as doc
    on prd.ProductID = doc.ProductID

BAD:

select *
  from Production.Product as prd
  left join Production.ProductSubcategory as psc
    on psc.ProductSubcategoryID = prd.ProductSubcategoryID  -- backwards
  join Production.ProductModel as prm                       -- intermingled
    on prm.ProductModelID = prd.ProductModelID              -- backwards
  left join Production.ProductDocument as doc
    on prd.ProductID = doc.ProductID

Avoid right joins as they are usually better written with a left join

GOOD:

select *
  from Production.Product as prd
  left join Production.ProductSubcategory as psc
    on ...

BAD:

select *
  from Production.ProductSubcategory as psc
 right join Production.Product as prd
    on ...

where clause

Multiple where clauses should go on different lines and align to the river:

select *
  from Production.Product prd
 where prd.Weight > 2.5
   and prd.ListPrice < 1500.00
   and Color in ('Blue', 'Black', 'Red')
   and SellStartDate >= '2006-01-01'
...

When mixing and and or statements, do not rely on order of operations and instead always use parenthesis to make the intent clear:

select *
  from Production.Product prd
 where (prd.Weight > 10.0
   and Color in ('Red', 'Silver'))
    or Color is null

Always put a semicolon on its own line when using them. This prevents common errors like adding conditions to a where clause and neglecting to move the trailing semicolon:

GOOD:

-- The prefix semicolon is clear and easy to spot when adding to a `where`
delete prd
  from Production.Product prd
 where prd.ListPrice = 0
   and weight is null
   and size is null
;
...

BAD:

-- A trailing semicolon is sinister.
-- We added some where conditions and missed it.
-- This is a destructive bug.
delete prd
  from Production.Product prd
 where prd.ListPrice = 0;  -- dangerous
   and weight is null      -- syntax error here, but the bad delete is valid
   and size is null
...

group by clause

Maintain the same column order as the select clause in the group by:

GOOD:

  select poh.EmployeeID
       , poh.VendorID
       , count(*) as OrderCount
       , avg(poh.SubTotal) as AvgSubTotal
    from Purchasing.PurchaseOrderHeader as poh
group by poh.EmployeeID
       , poh.VendorID

BAD:

-- messing with the 'group by' order makes it hard to scan for accuracy
  select poh.EmployeeID
       , poh.VendorID
       , count(*) as OrderCount
       , avg(poh.SubTotal) as AvgSubTotal
    from Purchasing.PurchaseOrderHeader as poh
group by poh.VendorID  -- out of order
       , poh.EmployeeID

having clause

A having clause is just a where clause for aggregate functions. The same rules for where clauses apply to having.

Example:

  select poh.EmployeeID
       , poh.VendorID
       , count(*) as OrderCount
       , avg(poh.SubTotal) as AvgSubTotal
    from Purchasing.PurchaseOrderHeader as poh
group by poh.EmployeeID
       , poh.VendorID
  having count(*) > 1
     and avg(poh.SubTotal) > 3000.00

order by clause

Do not use the superfluous asc in order by statements:

GOOD:

-- asc is implied and obvious
  select per.LastName
       , per.FirstName
    from Person.Person per
order by per.LastName
       , per.FirstName

BAD:

-- asc is clutter - it's never ambiguous when you wanted to sort ascending
  select per.LastName
       , per.FirstName
    from Person.Person per
order by per.LastName asc  -- useless asc
       , per.FirstName asc

Ordering by column number is okay, but not preferred:

-- This is okay, but not great.
  select per.FirstName + ' ' + per.LastName as FullName
       , per.LastName + ', ' + per.FirstName as LastFirst
    from Person.Person per
order by 2

The by keyword can sit on the other side of a 7th column river, but align the order by columns:

select per.FirstName
     , per.LastName
  from Person.Person per
 order by per.LastName
        , per.FirstName

If three or fewer columns are in the order by and have short names you may chose to have them occupy the same line for brevity.

-- shortcut for small columns
select per.FirstName, per.LastName
  from Person.Person per
 order by per.LastName, per.FirstName
@ajlive
Copy link

ajlive commented Oct 18, 2021

I have been toying with a reference formatter that uses Python's SQL Parse, but have not had the time to devote to finishing it. https://pypi.org/project/sqlparse/

Interesting. I would volunteer to help...if I also didn't have the time to devote to it :D

@galador
Copy link

galador commented Oct 22, 2021

I'm liking this style guide!

I've got one question. How come you don't recommend a format like the below? i.e Put the selected name at the beginning followed by an '='.

select top 100 SomeId = AnotherId, SomeMetric = MetricX+ MetricY from table

I'm not the author, but my guess is that this isn't recommended because it's a T-SQL (SQL Server) extension to standard SQL. It's not supported in many other database engines.

I personally find it confusing because I expect the left-most part of the expression to be the column name you're selecting from, not the final column alias name.

Also, new to markdown, I'm trying really hard to get the sql snippet above to display on multiple lines above but nothing I've tried works!

You want to use three backticks to start a multi-line block. This is what creates the snippet below:
image

select top 100 
       AnotherId as SomeId 
      ,MetricX + MetricY as SomeMetric
  from table

@JCarnall
Copy link

@galador thanks for the feedback, thought it might be that.

I prefer the convention as I find it makes understanding the intent of a sql select a lot quicker.

Thanks for the markdown formatting tips too!

@kthejoker
Copy link

kthejoker commented Oct 23, 2021

Just to point out how hard formatting is, your "order by" isn't aligned with the river in (at least) 3 examples.

I think an acceptable alternative is to add an additional indent your on clauses since:

  1. they frequently involve complex, multi-line logic
  2. they are secondary to the join itself (in terms of the river being "an important summary of what this query is doing"

@liam-caffrey-cs
Copy link

Yes, I think that the join operators are subordinate to the from clause and should be left justified (and indented as necessary) to the right of the river.

Is there an argument to put the leading commas in the river? With a tabindent of 3, then 2 tabs plus a comma positions you to the right of the river ready to type the column name. Putting a comma in position 6 just means messing around with backspaces. I realise that there should be a space after a comma in lists but in the case of a stacked list like the select list, I feel it is ok to put the comma in position 7 and the start of the column name in position 8.

Does anyone know a lint/parse tool that can be configured to produce the river style? I've tried SQLFluff (which is a great tool) but it seems a long way off the river style without diving deep into the internals of the tool.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment