arthurgousset/sql.md

## sql.md

      
    Raw
  

              sql.md
            
          
    🤷‍♂️ SQL (noob notes)


PostgreSQL docs

Context

These are noob notes for writing SQL queries (mostly notes-to-self).
I use SQL Tutorial and W3School to look up syntax questions and other details.
To practice concepts I use "easy" HackerRank SQL questions.


🤷‍♂️ SQL (noob notes)

Context
Querying data from a table
Using SQL Operators

Arithmetic Operators
Comparison Operators
Logical operators


Using COUNT(), AVG() and SUM()
WITH Clause


Querying data from a table

Query data in columns c1, c2 from a table
SELECT c1, c2 FROM t;
Query all rows and columns from a table
SELECT * FROM t;
Query data and filter rows with a condition
SELECT c1, c2 FROM t WHERE condition;
Query distinct rows from a table.
SELECT DISTINCT c1 FROM t WHERE condition;

Return only distinct (different) values.
Inside a table, a column often contains many duplicate values; and sometimes you only want to list the different (distinct) values.

Source: W3School
Sort the result set in ascending or descending order
SELECT c1, c2 FROM t ORDER BY c1 ASC [DESC];
Aliases for columns

The AS command is used to rename a column or table with an alias.
An alias only exists for the duration of the query.

SELECT CustomerID AS ID, CustomerName AS Customer FROM Customers;
Source: W3School
Aliases for tables
We use aliases to make the SQL shorter
SELECT o.OrderID, o.OrderDate, c.CustomerName
FROM Customers AS c, Orders AS o
WHERE c.CustomerName="Around the Horn" AND c.CustomerID=o.CustomerID;
Source: W3School
Using SQL Operators

Arithmetic Operators

Add
SELECT 30 + 20;
Subtract
SELECT 30 - 20;
Multiply
SELECT 30 * 20;
Divide
SELECT 30 / 10;
Modulo
SELECT 17 % 5;
>>> 2
Source: W3School
Comparison Operators


=	Equal to
>	Greater than
<	Less than
>=	Greater than or equal to
<=	Less than or equal to
<>	Not equal to

Source: W3School
Logical operators

ALL	TRUE if all of the subquery values meet the condition
SELECT ProductName 
FROM Products
WHERE ProductID = ALL (
  SELECT ProductID 
  FROM OrderDetails 
  WHERE Quantity = 10
);
AND	TRUE if all the conditions separated by AND is TRUE
SELECT * FROM Customers
WHERE City = "London" AND Country = "UK";

ANY	(or SOME) TRUE if any of the subquery values meet the condition.
ANY and SOME perform the same and function and probably exist because SQL was standardised late (Source: Stack Overflow - Why are they same with different names?)
SELECT * FROM Products
WHERE Price > ANY (
  SELECT Price 
  FROM Products 
  WHERE Price > 50
);
SELECT * FROM Products
WHERE Price > SOME (
  SELECT 
  Price 
  FROM Products 
  WHERE Price > 20
);
BETWEEN	TRUE if the operand is within the range of comparisons
SELECT * FROM Products
WHERE Price BETWEEN 50 AND 60;
EXISTS	TRUE if the subquery returns one or more records
SELECT SupplierName
FROM Suppliers
WHERE EXISTS (
  SELECT ProductName 
  FROM Products 
  WHERE Products.SupplierID = Suppliers.supplierID
  AND Price < 20
);
IN	TRUE if the operand is equal to one of a list of expressions
SELECT * FROM Customers
WHERE City IN ('Paris','London');
LIKE	TRUE if the operand matches a pattern
SELECT * FROM Customers
WHERE City LIKE 's%';
NOT	Displays a record if the condition(s) is NOT TRUE
SELECT * FROM Customers
WHERE City NOT LIKE 's%';
OR TRUE if any of the conditions separated by OR is TRUE
SELECT * FROM Customers
WHERE City = "London" OR Country = "UK";
Source: W3School
Using COUNT(), AVG() and SUM()


The COUNT() function returns the number of rows that matches a specified criterion.
The AVG() function returns the average value of a numeric column.
The SUM() function returns the total sum of a numeric column.

Example:
SELECT COUNT(column_name) - COUNT(DISTINCT column_name)
FROM table_name
WHERE condition;
>>> 13
Source: W3School
WITH Clause


The WITH clause in SQL was introduced in standard SQL to simplify complex long queries, especially those with JOINs and subqueries.
A WITH clause defines a temporary data set whose output is available to be referenced in subsequent queries.

Source: learnsql.com
with issuer_counts as (
    select selected.issuer as selected_issuer,
        CASE
            WHEN attested.identifier is NOT NULL THEN true
            ELSE false
        END  as successful, count(*) as count
    from attestations.attestationproxy_evt_attestationissuerselected as selected
    left join attestations.attestationproxy_evt_attestationcompleted as attested on 
        selected.account = attested.account and 
        selected.identifier = attested.identifier and 
        selected.issuer = attested.issuer
    where date(selected.evt_block_time) >= '{{from}}' and date(selected.evt_block_time) <= '{{to}}'
    group by (selected_issuer, successful)
),

successful as (
select * from issuer_counts where successful=true
),

fails as (
select * from issuer_counts where successful=false
),

ten_percent as (
    select (0.1 * count(distinct issuer)) as n_10p from attestations.attestationproxy_evt_attestationissuerselected
),

everyone as (
    select successful.count as n_s, fails.count as n_f, ((successful.count * 100.00)/ (successful.count + fails.count)) as success_rate
    from successful
    inner join fails on successful.selected_issuer = fails.selected_issuer
),

top_10p as (
    select * from everyone
    order by success_rate desc
    LIMIT (select n_10p from ten_percent)
),
bottom_10p as (
    select * from everyone
    order by success_rate asc
    LIMIT (select n_10p from ten_percent)
)

select sum(bottom_10p.n_s) * 100 / (sum(bottom_10p.n_s) + sum(bottom_10p.n_f)) as bottom_10p_rate,
sum(everyone.n_s) * 100 / (sum(everyone.n_s) + sum(everyone.n_f)) as overall_avg,
sum(top_10p.n_s) * 100 / (sum(top_10p.n_s) + sum(top_10p.n_f)) as top_10p_rate
from top_10p, everyone, bottom_10p
FROM Clause

The FROM clause specifies one or more source tables for the SELECT. If multiple sources are specified, the result is the Cartesian product (cross join) of all the sources.
But usually qualification conditions are added (via WHERE) to restrict the returned rows to a small subset of the Cartesian product.
TABLESAMPLE sampling_method ( argument )

A TABLESAMPLE clause after a table_name indicates that the specified sampling_method should be used to retrieve a subset of the rows in that table. This sampling precedes the application of any other filters such as WHERE clauses.
Arguments:

The standard PostgreSQL distribution includes two sampling methods, BERNOULLI and SYSTEM, and other sampling methods can be installed in the database via extensions.
The BERNOULLI and SYSTEM sampling methods each accept a single argument which is the fraction of the table to sample, expressed as a percentage between 0 and 100. This argument can be any real-valued expression.

Return:

These two methods each return a randomly-chosen sample of the table that will contain approximately the specified percentage of the table's rows.

SELECT   
    *
FROM 
    attestations.attestationproxy_evt_attestationissuerselectedv2
TABLESAMPLE 
    bernoulli ( 0.01 )
ORDER BY RANDOM()

Sometimes TABLESAMPLE BERNOULLI doesn't work due to the nature of the SQL table/view etc. In that case simply use:
SELECT   
    *
FROM 
    attestations.attestationproxy_evt_attestationissuerselectedv2
ORDER BY RANDOM()
LIMIT 100
Source: Stack Overflow
Date/Time Functions and Operators

date_trunc(field, source)

Parameters:

source is a value expression of type timestamp, timestamp with time zone, or interval
field selects to which precision to truncate the input value

microseconds
milliseconds
second
minute
hour
day
week
month
quarter
year
decade
century
millennium


Return:

type timestamp, timestamp with time zone, or interval, and it has all fields that are less significant than the selected one set to zero (or one, for day and month)

Example:
SELECT date_trunc('hour', TIMESTAMP '2001-02-16 20:38:40');
Result: 2001-02-16 20:00:00

SELECT date_trunc('year', TIMESTAMP '2001-02-16 20:38:40');
Result: 2001-01-01 00:00:00

SELECT date_trunc('day', TIMESTAMP WITH TIME ZONE '2001-02-16 20:38:40+00');
Result: 2001-02-16 00:00:00-05

SELECT date_trunc('day', TIMESTAMP WITH TIME ZONE '2001-02-16 20:38:40+00', 'Australia/Sydney');
Result: 2001-02-16 08:00:00-05

SELECT date_trunc('hour', INTERVAL '3 days 02:47:33');
Result: 3 days 02:00:00
Conditional Expressions

CASE WHEN

Example:
SELECT * FROM test;

 a
---
 1
 2
 3


SELECT a,
       CASE WHEN a=1 THEN 'one'
            WHEN a=2 THEN 'two'
            ELSE 'other'
       END
    FROM test;

 a | case
---+-------
 1 | one
 2 | two
 3 | other
Source: Postgresql docs
OVER

SELECT Id, StudentName, StudentGender, StudentAge,
SUM (StudentAge) OVER (ORDER BY Id) AS RunningAgeTotal
FROM Students
Source: Calculating Running Total with OVER Clause and PARTITION BY Clause in SQL Server
PARTITION BY

SELECT Id, StudentName, StudentGender, StudentAge,
SUM (StudentAge) OVER (PARTITION BY StudentGender ORDER BY Id) AS RunningAgeTotal
FROM Students
Source: Calculating Running Total with OVER Clause and PARTITION BY Clause in SQL Server
Arithmetic

Growth rate calculation

-- Suppose this this the data
 month | sale
-------+------
     1 | 2021
     2 | 2102
     3 | 2150
     4 | 2312
     5 | 2425

-- Growth rate query

SELECT 
  month,
  sum(sale) AS current_sale,
  lag(sum(sale), 1) OVER (ORDER BY month) AS previous_month_sale,
  (100 * (sum(sale) - lag(sum(sale), 1) OVER (ORDER BY month)) / lag(sum(sale), 1) OVER (ORDER BY month)) || '%' AS growth
FROM sales_data
GROUP BY 1
ORDER BY 1;

 month | current_sale | previous_month_sale | growth
-------+--------------+---------------------+--------
     1 |         2021 |                     |
     2 |         2102 |                2021 | 4%
     3 |         2150 |                2102 | 2%
     4 |         2312 |                2150 | 7%
     5 |         2425 |                2312 | 4%
Source: How To Calculate Month over Month Growth in PostgreSQL
Round numbers down using ::numeric (shorthand) or round(val::numeric, 2).
Source: PostgreSQL ROUND Function
Add the percentage sign using concat(val, '%') or the shorthand || for concatenation.
Source: 3 Ways to Format a Number as a Percentage in PostgreSQL
SUBSTRING()

substring(string [from <str_pos>] [for <ext_char>])
Source: w3resource
Parameters:

string: The main string from where the character to be extracted.
str_pos (Optional): The position of the string from where the extracting will be starting. If this parameter is omitted, the substring function will start at position 1 (which is the first position in the string).
ext_char (Optional): A number of characters to be extracted from the string. If this parameter is omitted, the substring function will return the entire string (from the start_position to the end of the string).


Source: w3resource
IN


You use IN operator in the WHERE clause to check if a value matches any value in a list of values.

value IN (value1,value2,...)
Source: Postgresql tutorial
IN with a subquery


The list of values can be a list of literal values such as numbers, strings or a result of a SELECT statement like this:

value IN (SELECT column_name FROM table_name);
SELECT
  customer_id,
  first_name,
  last_name
FROM
  customer
WHERE
  customer_id IN (
    SELECT customer_id
    FROM rental
    WHERE CAST (return_date AS DATE) = '2005-05-27'
  )
ORDER BY customer_id;
Source: Postgresql tutorial
pg_typeof() (postgresql)


If anyone else wonders How to just get data type of a varible (not column) you can use the pg_typeof(any) function.

SELECT pg_typeof(your_variable);
Source: Stack Overflow
Joins

CROSS JOIN


CROSS JOIN matches every row of the first table with every row of the second table. If the input tables have x and y columns, respectively, the resulting table will have x+y columns. Because CROSS JOINs have the potential to generate extremely large tables, care must be taken to use them only when appropriate.

SELECT ... 
FROM table1 
CROSS JOIN table2 ...
Source: Tutorialpoint
INNER JOIN (default)


A INNER JOIN creates a new result table by combining column values of two tables (table1 and table2) based upon the join-predicate. The query compares each row of table1 with each row of table2 to find all pairs of rows, which satisfy the join-predicate. When the join-predicate is satisfied, column values for each matched pair of rows of table1 and table2 are combined into a result row.
An INNER JOIN is the most common type of join and is the default type of join. You can use INNER keyword optionally.

SELECT table1.column1, table2.column2...
FROM table1
INNER JOIN table2
ON table1.common_filed = table2.common_field;
Source: Tutorialpoint
GROUP BY


recall the following general rule: each column in the SELECT statement should either be called in an aggregate function or be in the GROUP BY clause.

Source: How to Fix a 'Not a GROUP BY Expression' Error
ORA-00979 “Not a GROUP BY expression” is an error issued by the Oracle database when the SELECT statement contains a column that is neither listed in GROUP BY nor aggregated. This error message can be confusing to beginners.
Greatest N per group problem


find the group-identifier, max-value-in-group in a sub-query
join your table to the sub-query with equality on both group-identifier and max-value-in-group

E.g.
SELECT a.id, a.rev, a.contents
FROM YourTable a
INNER JOIN (
    SELECT id, MAX(rev) rev
    FROM YourTable
    GROUP BY id
) b ON a.id = b.id AND a.rev = b.rev
Source: Stack Overflow