These are noob notes for writing SQL queries (mostly notes-to-self). I use SQL Tutorial and W3School to look up syntax questions and other details. To practice concepts I use "easy" HackerRank SQL questions.
Query data in columns c1
, c2
from a table
SELECT c1, c2 FROM t;
Query all rows and columns from a table
SELECT * FROM t;
Query data and filter rows with a condition
SELECT c1, c2 FROM t WHERE condition;
Query distinct rows from a table.
SELECT DISTINCT c1 FROM t WHERE condition;
- Return only distinct (different) values.
- Inside a table, a column often contains many duplicate values; and sometimes you only want to list the different (distinct) values.
Source: W3School
Sort the result set in ascending or descending order
SELECT c1, c2 FROM t ORDER BY c1 ASC [DESC];
Aliases for columns
- The
AS
command is used to rename a column or table with an alias. - An alias only exists for the duration of the query.
SELECT CustomerID AS ID, CustomerName AS Customer FROM Customers;
Source: W3School
Aliases for tables
We use aliases to make the SQL shorter
SELECT o.OrderID, o.OrderDate, c.CustomerName
FROM Customers AS c, Orders AS o
WHERE c.CustomerName="Around the Horn" AND c.CustomerID=o.CustomerID;
Source: W3School
Add
SELECT 30 + 20;
Subtract
SELECT 30 - 20;
Multiply
SELECT 30 * 20;
Divide
SELECT 30 / 10;
Modulo
SELECT 17 % 5;
>>> 2
Source: W3School
=
Equal to>
Greater than<
Less than>=
Greater than or equal to<=
Less than or equal to<>
Not equal to
Source: W3School
ALL
TRUE if all of the subquery values meet the condition
SELECT ProductName
FROM Products
WHERE ProductID = ALL (
SELECT ProductID
FROM OrderDetails
WHERE Quantity = 10
);
AND
TRUE if all the conditions separated by AND is TRUE
SELECT * FROM Customers
WHERE City = "London" AND Country = "UK";
ANY
(or SOME
) TRUE if any of the subquery values meet the condition.
ANY
and SOME
perform the same and function and probably exist because SQL was standardised late (Source: Stack Overflow - Why are they same with different names?)
SELECT * FROM Products
WHERE Price > ANY (
SELECT Price
FROM Products
WHERE Price > 50
);
SELECT * FROM Products
WHERE Price > SOME (
SELECT
Price
FROM Products
WHERE Price > 20
);
BETWEEN
TRUE if the operand is within the range of comparisons
SELECT * FROM Products
WHERE Price BETWEEN 50 AND 60;
EXISTS
TRUE if the subquery returns one or more records
SELECT SupplierName
FROM Suppliers
WHERE EXISTS (
SELECT ProductName
FROM Products
WHERE Products.SupplierID = Suppliers.supplierID
AND Price < 20
);
IN
TRUE if the operand is equal to one of a list of expressions
SELECT * FROM Customers
WHERE City IN ('Paris','London');
LIKE
TRUE if the operand matches a pattern
SELECT * FROM Customers
WHERE City LIKE 's%';
NOT
Displays a record if the condition(s) is NOT TRUE
SELECT * FROM Customers
WHERE City NOT LIKE 's%';
OR
TRUE if any of the conditions separated by OR is TRUE
SELECT * FROM Customers
WHERE City = "London" OR Country = "UK";
Source: W3School
- The
COUNT()
function returns the number of rows that matches a specified criterion. - The
AVG()
function returns the average value of a numeric column. - The
SUM()
function returns the total sum of a numeric column.
Example:
SELECT COUNT(column_name) - COUNT(DISTINCT column_name)
FROM table_name
WHERE condition;
>>> 13
Source: W3School
The
WITH
clause in SQL was introduced in standard SQL to simplify complex long queries, especially those with JOINs and subqueries.A
WITH
clause defines a temporary data set whose output is available to be referenced in subsequent queries.
Source: learnsql.com
with issuer_counts as (
select selected.issuer as selected_issuer,
CASE
WHEN attested.identifier is NOT NULL THEN true
ELSE false
END as successful, count(*) as count
from attestations.attestationproxy_evt_attestationissuerselected as selected
left join attestations.attestationproxy_evt_attestationcompleted as attested on
selected.account = attested.account and
selected.identifier = attested.identifier and
selected.issuer = attested.issuer
where date(selected.evt_block_time) >= '{{from}}' and date(selected.evt_block_time) <= '{{to}}'
group by (selected_issuer, successful)
),
successful as (
select * from issuer_counts where successful=true
),
fails as (
select * from issuer_counts where successful=false
),
ten_percent as (
select (0.1 * count(distinct issuer)) as n_10p from attestations.attestationproxy_evt_attestationissuerselected
),
everyone as (
select successful.count as n_s, fails.count as n_f, ((successful.count * 100.00)/ (successful.count + fails.count)) as success_rate
from successful
inner join fails on successful.selected_issuer = fails.selected_issuer
),
top_10p as (
select * from everyone
order by success_rate desc
LIMIT (select n_10p from ten_percent)
),
bottom_10p as (
select * from everyone
order by success_rate asc
LIMIT (select n_10p from ten_percent)
)
select sum(bottom_10p.n_s) * 100 / (sum(bottom_10p.n_s) + sum(bottom_10p.n_f)) as bottom_10p_rate,
sum(everyone.n_s) * 100 / (sum(everyone.n_s) + sum(everyone.n_f)) as overall_avg,
sum(top_10p.n_s) * 100 / (sum(top_10p.n_s) + sum(top_10p.n_f)) as top_10p_rate
from top_10p, everyone, bottom_10p
The FROM
clause specifies one or more source tables for the SELECT
. If multiple sources are specified, the result is the Cartesian product (cross join) of all the sources.
But usually qualification conditions are added (via WHERE
) to restrict the returned rows to a small subset of the Cartesian product.
A TABLESAMPLE
clause after a table_name
indicates that the specified sampling_method should be used to retrieve a subset of the rows in that table. This sampling precedes the application of any other filters such as WHERE
clauses.
Arguments:
- The standard PostgreSQL distribution includes two sampling methods,
BERNOULLI
andSYSTEM
, and other sampling methods can be installed in the database via extensions. - The BERNOULLI and SYSTEM sampling methods each accept a single argument which is the fraction of the table to sample, expressed as a percentage between 0 and 100. This argument can be any real-valued expression.
Return:
- These two methods each return a randomly-chosen sample of the table that will contain approximately the specified percentage of the table's rows.
SELECT
*
FROM
attestations.attestationproxy_evt_attestationissuerselectedv2
TABLESAMPLE
bernoulli ( 0.01 )
Sometimes TABLESAMPLE BERNOULLI
doesn't work due to the nature of the SQL table/view etc. In that case simply use:
SELECT
*
FROM
attestations.attestationproxy_evt_attestationissuerselectedv2
ORDER BY RANDOM()
LIMIT 100
Source: Stack Overflow
Parameters:
source
is a value expression of typetimestamp
,timestamp with time zone
, orinterval
field
selects to which precision to truncate the input valuemicroseconds
milliseconds
second
minute
hour
day
week
month
quarter
year
decade
century
millennium
Return:
- type
timestamp
,timestamp with time zone
, orinterval
, and it has all fields that are less significant than the selected one set to zero (or one, for day and month)
Example:
SELECT date_trunc('hour', TIMESTAMP '2001-02-16 20:38:40');
Result: 2001-02-16 20:00:00
SELECT date_trunc('year', TIMESTAMP '2001-02-16 20:38:40');
Result: 2001-01-01 00:00:00
SELECT date_trunc('day', TIMESTAMP WITH TIME ZONE '2001-02-16 20:38:40+00');
Result: 2001-02-16 00:00:00-05
SELECT date_trunc('day', TIMESTAMP WITH TIME ZONE '2001-02-16 20:38:40+00', 'Australia/Sydney');
Result: 2001-02-16 08:00:00-05
SELECT date_trunc('hour', INTERVAL '3 days 02:47:33');
Result: 3 days 02:00:00
Example:
SELECT * FROM test;
a
---
1
2
3
SELECT a,
CASE WHEN a=1 THEN 'one'
WHEN a=2 THEN 'two'
ELSE 'other'
END
FROM test;
a | case
---+-------
1 | one
2 | two
3 | other
Source: Postgresql docs
SELECT Id, StudentName, StudentGender, StudentAge,
SUM (StudentAge) OVER (ORDER BY Id) AS RunningAgeTotal
FROM Students
Source: Calculating Running Total with OVER Clause and PARTITION BY Clause in SQL Server
SELECT Id, StudentName, StudentGender, StudentAge,
SUM (StudentAge) OVER (PARTITION BY StudentGender ORDER BY Id) AS RunningAgeTotal
FROM Students
Source: Calculating Running Total with OVER Clause and PARTITION BY Clause in SQL Server
-- Suppose this this the data
month | sale
-------+------
1 | 2021
2 | 2102
3 | 2150
4 | 2312
5 | 2425
-- Growth rate query
SELECT
month,
sum(sale) AS current_sale,
lag(sum(sale), 1) OVER (ORDER BY month) AS previous_month_sale,
(100 * (sum(sale) - lag(sum(sale), 1) OVER (ORDER BY month)) / lag(sum(sale), 1) OVER (ORDER BY month)) || '%' AS growth
FROM sales_data
GROUP BY 1
ORDER BY 1;
month | current_sale | previous_month_sale | growth
-------+--------------+---------------------+--------
1 | 2021 | |
2 | 2102 | 2021 | 4%
3 | 2150 | 2102 | 2%
4 | 2312 | 2150 | 7%
5 | 2425 | 2312 | 4%
Source: How To Calculate Month over Month Growth in PostgreSQL
Round numbers down using ::numeric
(shorthand) or round(val::numeric, 2)
.
Source: PostgreSQL ROUND Function
Add the percentage sign using concat(val, '%')
or the shorthand ||
for concatenation.
Source: 3 Ways to Format a Number as a Percentage in PostgreSQL
substring(string [from <str_pos>] [for <ext_char>])
Source: w3resource
Parameters:
string
: The main string from where the character to be extracted.str_pos
(Optional): The position of the string from where the extracting will be starting. If this parameter is omitted, the substring function will start at position 1 (which is the first position in the string).ext_char
(Optional): A number of characters to be extracted from the string. If this parameter is omitted, the substring function will return the entire string (from the start_position to the end of the string).
Source: w3resource
You use
IN
operator in theWHERE
clause to check if a value matches any value in a list of values.
value IN (value1,value2,...)
Source: Postgresql tutorial
The list of values can be a list of literal values such as numbers, strings or a result of a SELECT statement like this:
value IN (SELECT column_name FROM table_name);
SELECT
customer_id,
first_name,
last_name
FROM
customer
WHERE
customer_id IN (
SELECT customer_id
FROM rental
WHERE CAST (return_date AS DATE) = '2005-05-27'
)
ORDER BY customer_id;
Source: Postgresql tutorial
If anyone else wonders How to just get data type of a varible (not column) you can use the
pg_typeof(any)
function.
SELECT pg_typeof(your_variable);
Source: Stack Overflow
CROSS JOIN matches every row of the first table with every row of the second table. If the input tables have x and y columns, respectively, the resulting table will have x+y columns. Because CROSS JOINs have the potential to generate extremely large tables, care must be taken to use them only when appropriate.
SELECT ...
FROM table1
CROSS JOIN table2 ...
Source: Tutorialpoint
A INNER JOIN creates a new result table by combining column values of two tables (table1 and table2) based upon the join-predicate. The query compares each row of table1 with each row of table2 to find all pairs of rows, which satisfy the join-predicate. When the join-predicate is satisfied, column values for each matched pair of rows of table1 and table2 are combined into a result row.
An INNER JOIN is the most common type of join and is the default type of join. You can use INNER keyword optionally.
SELECT table1.column1, table2.column2...
FROM table1
INNER JOIN table2
ON table1.common_filed = table2.common_field;
Source: Tutorialpoint
recall the following general rule: each column in the SELECT statement should either be called in an aggregate function or be in the GROUP BY clause.
Source: How to Fix a 'Not a GROUP BY Expression' Error
ORA-00979 “Not a GROUP BY expression”
is an error issued by the Oracle database when the SELECT
statement contains a column that is neither listed in GROUP BY
nor aggregated. This error message can be confusing to beginners.
- find the
group-identifier
,max-value-in-group
in a sub-query - join your table to the sub-query with equality on both
group-identifier
andmax-value-in-group
E.g.
SELECT a.id, a.rev, a.contents
FROM YourTable a
INNER JOIN (
SELECT id, MAX(rev) rev
FROM YourTable
GROUP BY id
) b ON a.id = b.id AND a.rev = b.rev
Source: Stack Overflow
thank u bruh