reosarevok/sql_guide.md Secret

## sql_guide.md

      
    Raw
  

              sql_guide.md
            
          
    SQL style guide

This is the style guide for SQL queries for the MusicBrainz Server project. It is based on https://github.com/meadmaker/sql-style-guide
Formatting

Keywords


Keywords (AS, FROM, SELECT, WHERE, etc) should be UPPERCASED, but...


Function names (array_agg, count, json_build_object) should be lowercased, even if some of them are officially keywords.
/* Good */
SELECT count(*) FROM url WHERE url LIKE '%wikidata%';

/* Bad */
select count(*) from url where url like '%wikidata%';


Names


Names for tables and columns (including aliases) should always be lowercased and contain only ASCII letters and underscores.

Indentation and newlines


Newlines should be used for any query that is at all complex or longer than 78 characters.


Each clause should begin a new line.
SELECT, JOIN, LEFT JOIN, OUTER JOIN, WHERE, UNION, etc. are keywords that begin new clauses.
/* Good */
SELECT *
  FROM recording_first_release_date
 WHERE recording = ANY(?);

/* Bad */
SELECT * FROM recording_first_release_date
WHERE recording = ANY(?);


The keywords that begin a clause should be right-aligned.
The idea is to make a single character column between the keywords and their objects.
/* Good */
SELECT *
  FROM recording_first_release_date
 WHERE recording = ANY(?);

/* Bad */
SELECT *
FROM recording_first_release_date
WHERE recording = ANY(?);


Each column in a SELECT statement should be in a new line.
/* Good */
SELECT id,
       artist_credit,
       name
  FROM release;

/* Bad */
SELECT id, artist_credit, name
  FROM release;


Subqueries and other parentheses-enclosed blocks should be aligned as though the open parenthesis were the 0-column. So, they should be indented as a unit, to identify them as a block. Subqueries should continue to have the opening keywords right-aligned.
/* Good */
SELECT DISTINCT release.id AS release_id,
                release.name,
                row_number() OVER (ORDER BY release.name COLLATE musicbrainz)
           FROM (  SELECT medium.release,
                          min(medium.position) AS first_medium,
                          max(medium.position) AS last_medium,
                          count(medium.position) AS medium_count,
                          sum(medium.position) AS medium_pos_acc
                     FROM medium
                 GROUP BY medium.release) medium_sequence_data
           JOIN release ON release.id = medium_sequence_data.release
          WHERE first_medium != 1
             OR last_medium != medium_count
             OR (medium_count * (1 + medium_count)) / 2 <> medium_pos_acc;

/* Bad */
SELECT DISTINCT release.id AS release_id,
                release.name,
                row_number() OVER (ORDER BY release.name COLLATE musicbrainz)
           FROM (SELECT medium.release,
                        min(medium.position) AS first_medium,
                        max(medium.position) AS last_medium,
                        count(medium.position) AS medium_count,
                        sum(medium.position) AS medium_pos_acc
               FROM medium
               GROUP BY medium.release) medium_sequence_data
           JOIN release ON release.id = medium_sequence_data.release
          WHERE first_medium != 1
             OR last_medium != medium_count
             OR (medium_count * (1 + medium_count)) / 2 <> medium_pos_acc;
/* Good */
INSERT INTO release (id, gid, name, release_group, artist_credit)
     VALUES (1,
             '7b906020-72db-11de-8a39-0800200c9a70',
             'Release Group',
             1,
             1),
            (2,
             '7c906020-72db-11de-8a39-0800200c9a71',
             'Release Group',
             1,
             1);

/* Bad */
INSERT INTO release (id, gid, name, release_group, artist_credit)
     VALUES (1, '7b906020-72db-11de-8a39-0800200c9a70', 'Release Group', 1, 1),
       (2, '7c906020-72db-11de-8a39-0800200c9a71', 'Release Group', 1, 1);


Structure


Column aliases should always use the keyword AS
This becomes significant when a query has several columns selected with columns aliased.  Without the AS keyword, a dropped comma makes two columns become a single aliased column.
/* Good */
SELECT DISTINCT release.id AS release_id,
                release.name,
                row_number() OVER (ORDER BY release.name COLLATE musicbrainz)
           FROM release;

/* Bad */
SELECT DISTINCT release.id release_id,
                release.name,
                row_number() OVER (ORDER BY release.name COLLATE musicbrainz)
           FROM release;


Table aliases and column aliases should be descriptive.
Much like variable names, "a", "b", "x", etc are not generally useful in and of themselves outside of short examples.


Tiny names for table aliases can sometimes work as abbreviations.
As an example, if "releases" is referenced frequently, it might make sense to abbreviate it "r".  However, "rel" is almost as short, and much more descriptive.  Have a good reason for "r" instead of "rel".


Subquery aliases should be even more descriptive.
Subqueries effectively create ad-hoc tables in memory.  As such, if you name it "x", then there's absolutely nothing to suggest the intention behind the table to a later maintainer.
/* Good */
SELECT *
  FROM (  SELECT medium.release,
           [...]
        GROUP BY medium.release
  ) medium_sequence_data
  JOIN release ON release.id = medium_sequence_data.release;

/* Bad */
SELECT *
  FROM (  SELECT medium.release,
           [...]
        GROUP BY medium.release
  ) s
  JOIN release ON release.id = s.release;

Abbreviations don't help in subquery aliases


Queries inside Perl code


Newlines should be used for any query inside Perl code which would otherwise make its line longer than 78 characters.


When using newlines inside Perl code, prefer heredocs to multiline strings.
/* Good */
$c->sql->do(<<~'SQL');
    INSERT INTO artist (id, gid, name, sort_name)
         VALUES (1, 'a9d99e40-72d7-11de-8a39-0800200c9a66', 'Name', 'Name')
    SQL

/* Bad */
$c->sql->do(q{INSERT INTO artist (id, gid, name, sort_name)
                 VALUES (1, 'a9d99e40-72d7-11de-8a39-0800200c9a66', 'Name', 'Name')});


For long multiline queries that make use of Perl variables, prefer interpolation to appending strings, since it allows using heredoc blocks and usually improves readability.
/* Good */
my $tag_table = $self->tag_table;
my $tag_table_entity_column = $self->type;
my $query = <<~"SQL";
       SELECT tag.name,
              entity_tag.count,
              tag.id AS tag_id,
              genre.id AS genre_id
         FROM $tag_table entity_tag
         JOIN tag ON tag.id = entity_tag.tag
    LEFT JOIN genre ON tag.name = genre.name
        WHERE $tag_table_entity_column = ?
     ORDER BY entity_tag.count DESC,
              tag.name COLLATE musicbrainz
    SQL

/* Bad */
my $query = '   SELECT tag.name,
                       entity_tag.count,
                       tag.id AS tag_id,
                       genre.id AS genre_id
                  FROM ' . $self->tag_table . ' entity_tag
                  JOIN tag ON tag.id = entity_tag.tag
             LEFT JOIN genre ON tag.name = genre.name
                 WHERE ' . $self->type . ' = ?
              ORDER BY entity_tag.count DESC,
                       tag.name COLLATE musicbrainz';