This is the style guide for SQL queries for the MusicBrainz Server project. It is based on https://github.com/meadmaker/sql-style-guide
-
Keywords (AS, FROM, SELECT, WHERE, etc) should be UPPERCASED, but...
-
Function names (array_agg, count, json_build_object) should be lowercased, even if some of them are officially keywords.
/* Good */ SELECT count(*) FROM url WHERE url LIKE '%wikidata%'; /* Bad */ select count(*) from url where url like '%wikidata%';
- Names for tables and columns (including aliases) should always be lowercased and contain only ASCII letters and underscores.
-
Newlines should be used for any query that is at all complex or longer than 78 characters.
-
Each clause should begin a new line. SELECT, JOIN, LEFT JOIN, OUTER JOIN, WHERE, UNION, etc. are keywords that begin new clauses.
/* Good */ SELECT * FROM recording_first_release_date WHERE recording = ANY(?); /* Bad */ SELECT * FROM recording_first_release_date WHERE recording = ANY(?);
-
The keywords that begin a clause should be right-aligned. The idea is to make a single character column between the keywords and their objects.
/* Good */ SELECT * FROM recording_first_release_date WHERE recording = ANY(?); /* Bad */ SELECT * FROM recording_first_release_date WHERE recording = ANY(?);
-
Each column in a SELECT statement should be in a new line.
/* Good */ SELECT id, artist_credit, name FROM release; /* Bad */ SELECT id, artist_credit, name FROM release;
-
Subqueries and other parentheses-enclosed blocks should be aligned as though the open parenthesis were the 0-column. So, they should be indented as a unit, to identify them as a block. Subqueries should continue to have the opening keywords right-aligned.
/* Good */ SELECT DISTINCT release.id AS release_id, release.name, row_number() OVER (ORDER BY release.name COLLATE musicbrainz) FROM ( SELECT medium.release, min(medium.position) AS first_medium, max(medium.position) AS last_medium, count(medium.position) AS medium_count, sum(medium.position) AS medium_pos_acc FROM medium GROUP BY medium.release) medium_sequence_data JOIN release ON release.id = medium_sequence_data.release WHERE first_medium != 1 OR last_medium != medium_count OR (medium_count * (1 + medium_count)) / 2 <> medium_pos_acc; /* Bad */ SELECT DISTINCT release.id AS release_id, release.name, row_number() OVER (ORDER BY release.name COLLATE musicbrainz) FROM (SELECT medium.release, min(medium.position) AS first_medium, max(medium.position) AS last_medium, count(medium.position) AS medium_count, sum(medium.position) AS medium_pos_acc FROM medium GROUP BY medium.release) medium_sequence_data JOIN release ON release.id = medium_sequence_data.release WHERE first_medium != 1 OR last_medium != medium_count OR (medium_count * (1 + medium_count)) / 2 <> medium_pos_acc;
/* Good */ INSERT INTO release (id, gid, name, release_group, artist_credit) VALUES (1, '7b906020-72db-11de-8a39-0800200c9a70', 'Release Group', 1, 1), (2, '7c906020-72db-11de-8a39-0800200c9a71', 'Release Group', 1, 1); /* Bad */ INSERT INTO release (id, gid, name, release_group, artist_credit) VALUES (1, '7b906020-72db-11de-8a39-0800200c9a70', 'Release Group', 1, 1), (2, '7c906020-72db-11de-8a39-0800200c9a71', 'Release Group', 1, 1);
-
Column aliases should always use the keyword AS This becomes significant when a query has several columns selected with columns aliased. Without the AS keyword, a dropped comma makes two columns become a single aliased column.
/* Good */ SELECT DISTINCT release.id AS release_id, release.name, row_number() OVER (ORDER BY release.name COLLATE musicbrainz) FROM release; /* Bad */ SELECT DISTINCT release.id release_id, release.name, row_number() OVER (ORDER BY release.name COLLATE musicbrainz) FROM release;
-
Table aliases and column aliases should be descriptive. Much like variable names, "a", "b", "x", etc are not generally useful in and of themselves outside of short examples.
-
Tiny names for table aliases can sometimes work as abbreviations. As an example, if "releases" is referenced frequently, it might make sense to abbreviate it "r". However, "rel" is almost as short, and much more descriptive. Have a good reason for "r" instead of "rel".
-
Subquery aliases should be even more descriptive. Subqueries effectively create ad-hoc tables in memory. As such, if you name it "x", then there's absolutely nothing to suggest the intention behind the table to a later maintainer.
/* Good */ SELECT * FROM ( SELECT medium.release, [...] GROUP BY medium.release ) medium_sequence_data JOIN release ON release.id = medium_sequence_data.release; /* Bad */ SELECT * FROM ( SELECT medium.release, [...] GROUP BY medium.release ) s JOIN release ON release.id = s.release;
- Abbreviations don't help in subquery aliases
-
Newlines should be used for any query inside Perl code which would otherwise make its line longer than 78 characters.
-
When using newlines inside Perl code, prefer heredocs to multiline strings.
/* Good */ $c->sql->do(<<~'SQL'); INSERT INTO artist (id, gid, name, sort_name) VALUES (1, 'a9d99e40-72d7-11de-8a39-0800200c9a66', 'Name', 'Name') SQL /* Bad */ $c->sql->do(q{INSERT INTO artist (id, gid, name, sort_name) VALUES (1, 'a9d99e40-72d7-11de-8a39-0800200c9a66', 'Name', 'Name')});
-
For long multiline queries that make use of Perl variables, prefer interpolation to appending strings, since it allows using heredoc blocks and usually improves readability.
/* Good */ my $tag_table = $self->tag_table; my $tag_table_entity_column = $self->type; my $query = <<~"SQL"; SELECT tag.name, entity_tag.count, tag.id AS tag_id, genre.id AS genre_id FROM $tag_table entity_tag JOIN tag ON tag.id = entity_tag.tag LEFT JOIN genre ON tag.name = genre.name WHERE $tag_table_entity_column = ? ORDER BY entity_tag.count DESC, tag.name COLLATE musicbrainz SQL /* Bad */ my $query = ' SELECT tag.name, entity_tag.count, tag.id AS tag_id, genre.id AS genre_id FROM ' . $self->tag_table . ' entity_tag JOIN tag ON tag.id = entity_tag.tag LEFT JOIN genre ON tag.name = genre.name WHERE ' . $self->type . ' = ? ORDER BY entity_tag.count DESC, tag.name COLLATE musicbrainz';
I'd suggest function names should always be lowercase, even if they're considered keywords (e.g. min, max, count, avg, array_agg, etc. -- ones part of the SQL standard). This seems less confusing, because there are a lot of built-in postgres functions that aren't part of the SQL standard, and it would feel arbitrary to uppercase e.g. ARRAY_AGG but not JSON_BUILD_OBJECT. It also feels more consistent with user-defined functions, which are already always lowercased in our code.