Skip to content

Instantly share code, notes, and snippets.

@richb-hanover
Last active January 10, 2023 04:37
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save richb-hanover/65206c3fc360ab245967c1402068d3e1 to your computer and use it in GitHub Desktop.
Save richb-hanover/65206c3fc360ab245967c1402068d3e1 to your computer and use it in GitHub Desktop.
Narrative for PRQL queries in the Chinook database

PRQL Tutorial

The PRQL language allows you to build a pipeline from a series of SQL-like commands, then compile those commands to correct SQL that can be used to make actual queries.

The PRQL Playground provides a way to, well, play with entering PRQL statements to see the resulting SQL. Enter the PRQL in the middle column, see the resulting SQL on the right. (Click output.sql)

In addition, the Playground is pre-loaded with the Chinook database that contains several tables regarding record albums, composers, artists, sales data and invoices for those albums. Click the artists-0.prql link on the right to see the commands below. Or copy/paste the items from the table one at a time to see how the result on the right changes. (Be sure to click output.arrow.)

You may also watch the Youtube video to see how the output on the left changes as the commands below are copy-pasted into the Playground.

PRQL Statement Description
from tracks
All PRQL pipelines begin with a from statement that identifies a beginning table for the pipeline.
select [
  album_id,
  name,
  unit_price,
]
The select statement takes a list in [ ... ] and passes those column names to the next statement. Lists are comma-separated, and trailing commas are allowed. A list can be on one or multiple lines.
sort [-unit_price, name]
sort also takes a list of column names. The "-" means descending sort.
group album_id (
  aggregate [
    track_count = count,
    album_price = sum unit_price
    ]
)
The group statement groups by its argument(s) (that may be a list.) The aggregate takes a list, and creates new columns from the aggregated values.
join albums [==album_id]
The join statement joins the previous table with another table matching on the column(s) named in the list. If the match columns have the same name in both tables, use == to save typing.
group artist_id (
  aggregate [
    track_count = sum track_count,
    artist_price = sum album_price
    ]
)
This group collects the "track_count" and "artist_price" for each artist_id. Note the use of track_count = ... to set an alias for the column. This alias is used as the column heading and to refer to this column in subsequent statements.
join artists [==artist_id]
select [artists.name, artist_price, track_count]
Another join and select. Note the use of artists.name to uniquely specify which "name" is desired.
sort [-artist_price]
Another sort placing the "most expensive" artist price at the top.
derive avg_track_price = artist_price / track_count
The derive statement computes a new column for the table, setting a name for that new column if a name = ... is specified.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment