Skip to content

Instantly share code, notes, and snippets.

@arthurgousset
Last active February 8, 2024 11:41
Show Gist options
  • Save arthurgousset/69e26848d91a355b5eac07202e11224e to your computer and use it in GitHub Desktop.
Save arthurgousset/69e26848d91a355b5eac07202e11224e to your computer and use it in GitHub Desktop.
SQL (noob notess)

🤷‍♂️ SQL (noob notes)

Context

These are noob notes for writing SQL queries (mostly notes-to-self). I use SQL Tutorial and W3School to look up syntax questions and other details. To practice concepts I use "easy" HackerRank SQL questions.

Querying data from a table

Query data in columns c1, c2 from a table

SELECT c1, c2 FROM t;

Query all rows and columns from a table

SELECT * FROM t;

Query data and filter rows with a condition

SELECT c1, c2 FROM t WHERE condition;

Query distinct rows from a table.

SELECT DISTINCT c1 FROM t WHERE condition;
  • Return only distinct (different) values.
  • Inside a table, a column often contains many duplicate values; and sometimes you only want to list the different (distinct) values.

Source: W3School

Sort the result set in ascending or descending order

SELECT c1, c2 FROM t ORDER BY c1 ASC [DESC];

Aliases for columns

  • The AS command is used to rename a column or table with an alias.
  • An alias only exists for the duration of the query.
SELECT CustomerID AS ID, CustomerName AS Customer FROM Customers;

Source: W3School

Aliases for tables

We use aliases to make the SQL shorter

SELECT o.OrderID, o.OrderDate, c.CustomerName
FROM Customers AS c, Orders AS o
WHERE c.CustomerName="Around the Horn" AND c.CustomerID=o.CustomerID;

Source: W3School

Using SQL Operators

Arithmetic Operators

Add

SELECT 30 + 20;

Subtract

SELECT 30 - 20;

Multiply

SELECT 30 * 20;

Divide

SELECT 30 / 10;

Modulo

SELECT 17 % 5;
>>> 2

Source: W3School

Comparison Operators

  • = Equal to
  • > Greater than
  • < Less than
  • >= Greater than or equal to
  • <= Less than or equal to
  • <> Not equal to

Source: W3School

Logical operators

ALL TRUE if all of the subquery values meet the condition

SELECT ProductName 
FROM Products
WHERE ProductID = ALL (
  SELECT ProductID 
  FROM OrderDetails 
  WHERE Quantity = 10
);

AND TRUE if all the conditions separated by AND is TRUE

SELECT * FROM Customers
WHERE City = "London" AND Country = "UK";

ANY (or SOME) TRUE if any of the subquery values meet the condition.

ANY and SOME perform the same and function and probably exist because SQL was standardised late (Source: Stack Overflow - Why are they same with different names?)

SELECT * FROM Products
WHERE Price > ANY (
  SELECT Price 
  FROM Products 
  WHERE Price > 50
);
SELECT * FROM Products
WHERE Price > SOME (
  SELECT 
  Price 
  FROM Products 
  WHERE Price > 20
);

BETWEEN TRUE if the operand is within the range of comparisons

SELECT * FROM Products
WHERE Price BETWEEN 50 AND 60;

EXISTS TRUE if the subquery returns one or more records

SELECT SupplierName
FROM Suppliers
WHERE EXISTS (
  SELECT ProductName 
  FROM Products 
  WHERE Products.SupplierID = Suppliers.supplierID
  AND Price < 20
);

IN TRUE if the operand is equal to one of a list of expressions

SELECT * FROM Customers
WHERE City IN ('Paris','London');

LIKE TRUE if the operand matches a pattern

SELECT * FROM Customers
WHERE City LIKE 's%';

NOT Displays a record if the condition(s) is NOT TRUE

SELECT * FROM Customers
WHERE City NOT LIKE 's%';

OR TRUE if any of the conditions separated by OR is TRUE

SELECT * FROM Customers
WHERE City = "London" OR Country = "UK";

Source: W3School

Using COUNT(), AVG() and SUM()

  • The COUNT() function returns the number of rows that matches a specified criterion.
  • The AVG() function returns the average value of a numeric column.
  • The SUM() function returns the total sum of a numeric column.

Example:

SELECT COUNT(column_name) - COUNT(DISTINCT column_name)
FROM table_name
WHERE condition;
>>> 13

Source: W3School

WITH Clause

The WITH clause in SQL was introduced in standard SQL to simplify complex long queries, especially those with JOINs and subqueries.

A WITH clause defines a temporary data set whose output is available to be referenced in subsequent queries.

Source: learnsql.com

with issuer_counts as (
    select selected.issuer as selected_issuer,
        CASE
            WHEN attested.identifier is NOT NULL THEN true
            ELSE false
        END  as successful, count(*) as count
    from attestations.attestationproxy_evt_attestationissuerselected as selected
    left join attestations.attestationproxy_evt_attestationcompleted as attested on 
        selected.account = attested.account and 
        selected.identifier = attested.identifier and 
        selected.issuer = attested.issuer
    where date(selected.evt_block_time) >= '{{from}}' and date(selected.evt_block_time) <= '{{to}}'
    group by (selected_issuer, successful)
),

successful as (
select * from issuer_counts where successful=true
),

fails as (
select * from issuer_counts where successful=false
),

ten_percent as (
    select (0.1 * count(distinct issuer)) as n_10p from attestations.attestationproxy_evt_attestationissuerselected
),

everyone as (
    select successful.count as n_s, fails.count as n_f, ((successful.count * 100.00)/ (successful.count + fails.count)) as success_rate
    from successful
    inner join fails on successful.selected_issuer = fails.selected_issuer
),

top_10p as (
    select * from everyone
    order by success_rate desc
    LIMIT (select n_10p from ten_percent)
),
bottom_10p as (
    select * from everyone
    order by success_rate asc
    LIMIT (select n_10p from ten_percent)
)

select sum(bottom_10p.n_s) * 100 / (sum(bottom_10p.n_s) + sum(bottom_10p.n_f)) as bottom_10p_rate,
sum(everyone.n_s) * 100 / (sum(everyone.n_s) + sum(everyone.n_f)) as overall_avg,
sum(top_10p.n_s) * 100 / (sum(top_10p.n_s) + sum(top_10p.n_f)) as top_10p_rate
from top_10p, everyone, bottom_10p

The FROM clause specifies one or more source tables for the SELECT. If multiple sources are specified, the result is the Cartesian product (cross join) of all the sources.

But usually qualification conditions are added (via WHERE) to restrict the returned rows to a small subset of the Cartesian product.

A TABLESAMPLE clause after a table_name indicates that the specified sampling_method should be used to retrieve a subset of the rows in that table. This sampling precedes the application of any other filters such as WHERE clauses.

Arguments:

  • The standard PostgreSQL distribution includes two sampling methods, BERNOULLI and SYSTEM, and other sampling methods can be installed in the database via extensions.
  • The BERNOULLI and SYSTEM sampling methods each accept a single argument which is the fraction of the table to sample, expressed as a percentage between 0 and 100. This argument can be any real-valued expression.

Return:

  • These two methods each return a randomly-chosen sample of the table that will contain approximately the specified percentage of the table's rows.
SELECT   
    *
FROM 
    attestations.attestationproxy_evt_attestationissuerselectedv2
TABLESAMPLE 
    bernoulli ( 0.01 )

Sometimes TABLESAMPLE BERNOULLI doesn't work due to the nature of the SQL table/view etc. In that case simply use:

SELECT   
    *
FROM 
    attestations.attestationproxy_evt_attestationissuerselectedv2
ORDER BY RANDOM()
LIMIT 100

Source: Stack Overflow

Parameters:

  • source is a value expression of type timestamp, timestamp with time zone, or interval
  • field selects to which precision to truncate the input value
    • microseconds
    • milliseconds
    • second
    • minute
    • hour
    • day
    • week
    • month
    • quarter
    • year
    • decade
    • century
    • millennium

Return:

  • type timestamp, timestamp with time zone, or interval, and it has all fields that are less significant than the selected one set to zero (or one, for day and month)

Example:

SELECT date_trunc('hour', TIMESTAMP '2001-02-16 20:38:40');
Result: 2001-02-16 20:00:00

SELECT date_trunc('year', TIMESTAMP '2001-02-16 20:38:40');
Result: 2001-01-01 00:00:00

SELECT date_trunc('day', TIMESTAMP WITH TIME ZONE '2001-02-16 20:38:40+00');
Result: 2001-02-16 00:00:00-05

SELECT date_trunc('day', TIMESTAMP WITH TIME ZONE '2001-02-16 20:38:40+00', 'Australia/Sydney');
Result: 2001-02-16 08:00:00-05

SELECT date_trunc('hour', INTERVAL '3 days 02:47:33');
Result: 3 days 02:00:00

Example:

SELECT * FROM test;

 a
---
 1
 2
 3


SELECT a,
       CASE WHEN a=1 THEN 'one'
            WHEN a=2 THEN 'two'
            ELSE 'other'
       END
    FROM test;

 a | case
---+-------
 1 | one
 2 | two
 3 | other

Source: Postgresql docs

OVER

SELECT Id, StudentName, StudentGender, StudentAge,
SUM (StudentAge) OVER (ORDER BY Id) AS RunningAgeTotal
FROM Students

Source: Calculating Running Total with OVER Clause and PARTITION BY Clause in SQL Server

PARTITION BY

SELECT Id, StudentName, StudentGender, StudentAge,
SUM (StudentAge) OVER (PARTITION BY StudentGender ORDER BY Id) AS RunningAgeTotal
FROM Students

Source: Calculating Running Total with OVER Clause and PARTITION BY Clause in SQL Server

Arithmetic

Growth rate calculation

-- Suppose this this the data
 month | sale
-------+------
     1 | 2021
     2 | 2102
     3 | 2150
     4 | 2312
     5 | 2425

-- Growth rate query

SELECT 
  month,
  sum(sale) AS current_sale,
  lag(sum(sale), 1) OVER (ORDER BY month) AS previous_month_sale,
  (100 * (sum(sale) - lag(sum(sale), 1) OVER (ORDER BY month)) / lag(sum(sale), 1) OVER (ORDER BY month)) || '%' AS growth
FROM sales_data
GROUP BY 1
ORDER BY 1;

 month | current_sale | previous_month_sale | growth
-------+--------------+---------------------+--------
     1 |         2021 |                     |
     2 |         2102 |                2021 | 4%
     3 |         2150 |                2102 | 2%
     4 |         2312 |                2150 | 7%
     5 |         2425 |                2312 | 4%

Source: How To Calculate Month over Month Growth in PostgreSQL

Round numbers down using ::numeric (shorthand) or round(val::numeric, 2).

Source: PostgreSQL ROUND Function

Add the percentage sign using concat(val, '%') or the shorthand || for concatenation.

Source: 3 Ways to Format a Number as a Percentage in PostgreSQL

SUBSTRING()

substring(string [from <str_pos>] [for <ext_char>])

Source: w3resource

Parameters:

  • string: The main string from where the character to be extracted.
  • str_pos (Optional): The position of the string from where the extracting will be starting. If this parameter is omitted, the substring function will start at position 1 (which is the first position in the string).
  • ext_char (Optional): A number of characters to be extracted from the string. If this parameter is omitted, the substring function will return the entire string (from the start_position to the end of the string).

image

Source: w3resource

IN

You use IN operator in the WHERE clause to check if a value matches any value in a list of values.

value IN (value1,value2,...)

Source: Postgresql tutorial

IN with a subquery

The list of values can be a list of literal values such as numbers, strings or a result of a SELECT statement like this:

value IN (SELECT column_name FROM table_name);
SELECT
  customer_id,
  first_name,
  last_name
FROM
  customer
WHERE
  customer_id IN (
    SELECT customer_id
    FROM rental
    WHERE CAST (return_date AS DATE) = '2005-05-27'
  )
ORDER BY customer_id;

Source: Postgresql tutorial

pg_typeof() (postgresql)

If anyone else wonders How to just get data type of a varible (not column) you can use the pg_typeof(any) function.

SELECT pg_typeof(your_variable);

Source: Stack Overflow

Joins

CROSS JOIN

CROSS JOIN matches every row of the first table with every row of the second table. If the input tables have x and y columns, respectively, the resulting table will have x+y columns. Because CROSS JOINs have the potential to generate extremely large tables, care must be taken to use them only when appropriate.

SELECT ... 
FROM table1 
CROSS JOIN table2 ...

Source: Tutorialpoint

INNER JOIN (default)

A INNER JOIN creates a new result table by combining column values of two tables (table1 and table2) based upon the join-predicate. The query compares each row of table1 with each row of table2 to find all pairs of rows, which satisfy the join-predicate. When the join-predicate is satisfied, column values for each matched pair of rows of table1 and table2 are combined into a result row.

An INNER JOIN is the most common type of join and is the default type of join. You can use INNER keyword optionally.

SELECT table1.column1, table2.column2...
FROM table1
INNER JOIN table2
ON table1.common_filed = table2.common_field;

Source: Tutorialpoint

GROUP BY

recall the following general rule: each column in the SELECT statement should either be called in an aggregate function or be in the GROUP BY clause.

Source: How to Fix a 'Not a GROUP BY Expression' Error

ORA-00979 “Not a GROUP BY expression” is an error issued by the Oracle database when the SELECT statement contains a column that is neither listed in GROUP BY nor aggregated. This error message can be confusing to beginners.

Greatest N per group problem

  1. find the group-identifier, max-value-in-group in a sub-query
  2. join your table to the sub-query with equality on both group-identifier and max-value-in-group

E.g.

SELECT a.id, a.rev, a.contents
FROM YourTable a
INNER JOIN (
    SELECT id, MAX(rev) rev
    FROM YourTable
    GROUP BY id
) b ON a.id = b.id AND a.rev = b.rev

Source: Stack Overflow

@eyo20
Copy link

eyo20 commented Feb 8, 2024

thank u bruh

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment