richb-hanover/QueryChinook.md

## QueryChinook.md

      
    Raw
  

              QueryChinook.md
            
          
    PRQL Tutorial

The PRQL language allows you to build a pipeline from a series of SQL-like commands,
then compile those commands to correct SQL that can be used to make actual queries.
The PRQL Playground provides a way to, well, play with entering PRQL statements to see
the resulting SQL.
Enter the PRQL in the middle column, see the resulting SQL on the right.
(Click output.sql)
In addition, the Playground is pre-loaded with the Chinook database that contains several tables
regarding record albums, composers, artists,
sales data and invoices for those albums.
Click the artists-0.prql link on the right to see the commands below.
Or copy/paste the items from the table one at a time to see how the result on the right changes.
(Be sure to click output.arrow.)
You may also watch the Youtube video to see how the output on the left changes as the commands
below are copy-pasted into the Playground.


   PRQL Statement   Description 


from tracks


 All PRQL pipelines begin with a from statement that identifies a beginning table for the pipeline. 


select [
  album_id,
  name,
  unit_price,
]


 The select statement takes a list  
in [ ... ]
and passes those column names to the next statement.
Lists are comma-separated, and trailing commas are allowed.
A list can be on one or multiple lines.


sort [-unit_price, name]


 sort also takes a list of column names. The "-" means descending sort. 


group album_id (
  aggregate [
    track_count = count,
    album_price = sum unit_price
    ]
)


 The group statement groups by its argument(s) (that may be a list.)
The aggregate takes a list, and creates new columns from the aggregated values.


join albums [==album_id]


 The join statement joins the previous table with another table
matching on the column(s) named in the list.
If the match columns have the same name in both tables, use == to save typing.


group artist_id (
  aggregate [
    track_count = sum track_count,
    artist_price = sum album_price
    ]
)


 This group collects the "track_count" and "artist_price" for each artist_id.
Note the use of track_count = ... to set an alias for the column. 
This alias is used as the column heading and to refer to this column in subsequent statements. 


join artists [==artist_id]
select [artists.name, artist_price, track_count]


 Another join and select. 
Note the use of artists.name to uniquely specify which "name" is desired.


sort [-artist_price]


 Another sort placing the "most expensive" artist price at the top.


derive avg_track_price = artist_price / track_count


 The derive statement computes a new column for the table, 
setting a name for that new column if a name = ... is specified.
PRQL Statement	Description
`from tracks`	All PRQL pipelines begin with a `from` statement that identifies a beginning table for the pipeline.
`select [ album_id, name, unit_price, ]`	The `select` statement takes a list in `[ ... ]` and passes those column names to the next statement. Lists are comma-separated, and trailing commas are allowed. A list can be on one or multiple lines.
`sort [-unit_price, name]`	`sort` also takes a list of column names. The "-" means descending sort.
`group album_id ( aggregate [ track_count = count, album_price = sum unit_price ] )`	The `group` statement groups by its argument(s) (that may be a list.) The `aggregate` takes a list, and creates new columns from the aggregated values.
`join albums [==album_id]`	The `join` statement joins the previous table with another table matching on the column(s) named in the list. If the match columns have the same name in both tables, use `==` to save typing.
`group artist_id ( aggregate [ track_count = sum track_count, artist_price = sum album_price ] )`	This `group` collects the "track_count" and "artist_price" for each `artist_id`. Note the use of `track_count = ...` to set an alias for the column. This alias is used as the column heading and to refer to this column in subsequent statements.
`join artists [==artist_id] select [artists.name, artist_price, track_count]`	Another `join` and `select`. Note the use of `artists.name` to uniquely specify which "name" is desired.
`sort [-artist_price]`	Another `sort` placing the "most expensive" artist price at the top.
`derive avg_track_price = artist_price / track_count`	The `derive` statement computes a new column for the table, setting a name for that new column if a `name = ...` is specified.