- Local env setup
- Bonus: CloudSQL methods
- Configuration: most useful configs for balanced performance
- Basics of query dynamics:
- How a Query Works
- JOINs
- CTEs
- Transactions
- Functions
- Window Functions
- Aggregations
- Data Generation
- Constraints
- Useful Queries For Daily Business
- Some of main 4-Cast queries
- Performance Hints: How to
explain
the Queries- Indexes
- Statistics, Vacuum, Analyze, Re-Index
- Maintenance:
- Backup & Restore
- Storage: All About Sizes (see: link)
- Upgrades
- Migrations
- Bonus:
- Transactions
- Isolation Levels
psql
Tips- Columnar
➜ docker run --name demo_server -e POSTGRES_PASSWORD=secret -e POSTGRES_DB=demo -e POSTGRES_USER=admin -p 54455:5432 postgis/postgis:14-3.2-alpine -d
# overwrite `application_name` to enable transparant monitoring
➜ docker exec -it demo_server psql -U admin -d 'dbname=demo application_name=ilhan_cli_app'
# OR
➜ docker -e PGAPPNAME=ilhan_cli_app exec -it demo_server psql -U admin -d demo
Get cloud proxy binaries from link
# Expose database to local access
PRROJECT_ID=... \
REGION=... \
INSTANCE_NAME=... \
./cloud_sql_proxy -instances=${PRROJECT_ID:-myproject}:{REGION:-myregioon}:{INSTANCE_NAME:-myinstance}=tcp:54552
- Use
clone
functionality to have an exact copy of current database - Run experiments
or
heavy qureies (over yearly reporting queries) against clone db - Delete clones as soon as you finished your experiment
Use SHOW all;
to get all available configurations.
Important values:
- effective_cache_size
- max_connextions
- maintenance_work_mem
- statement_timeout
- log_min_duration_statement
- max_locks_per_transaction
Recommendations
- Use
~/.psqlrc
for personalized values - Use
LOCAL
setting changes for transaction based changes SET ...
is only effectivfe for current session- Database Connections
- Keep
max_connextions
low - Use
pg_stat_*
for overviews. PS: Google CloudSQL's query-insights also usespg_stat_activity
- Keep
- Query Execution Order: Top to down execution order => FROM, JOIN -> WHERE -> GROUP BY -> HAVING -> SELECT -> ORDER BY -> DISTINCT [ON] -> LIMIT
- When more than one table reference is listed in the
FROM
clause, the tables are cross-joined (the Cartesian product) - Table Functions can be handy for query re-use
JOIN
clauses nest left-to-right by default. Parentheses can be used to control the JOIN order
CROSS JOIN
: For every possible combination of rows from T1 and T2 a row consisting of all columns in T1 followed by all columns in T2. T1(N rows), T2(M rows) => N * M rows.FROM T1 CROSS JOIN T2
is equivalent toFROM T1 INNER JOIN T2 ON TRUE
. It is also equivalent toFROM T1, T2
- The words INNER and OUTER are optional in all forms.
- INNER is the default.
- LEFT, RIGHT and FULL imply an
OUTER JOIN
. JOIN T1, T2 USING (a, b)
is equevalant to.. ON T1.a = T2.a AND T1.b = T2.b
. This is handy when joined column names are same.NATURAL
is a shorthand form of USING: it forms a USING list consisting of all column names that appear in both input tables.USING
is safe for schema changes,NATURAL
raises conflicting risks
q1 UNION [ALL] q2
,q1 INTERSECT [ALL] q2
andq1 EXCEPT [ALL] q2
can be used to combine result of two queries- Ordering (like
ORDER BY sort_expression1 [ASC | DESC] [NULLS { FIRST | LAST }]
) happens at the end of data fetching and is one of the most expensive operations - VALUES (
VALUES (1, 'one'), (2, 'two'), (3, 'three')
) can be used to generate intermediate tables to ease queries RETURNING *
give control over output of any query- Use range data types when applicable (see link)
demo=# create table api(jdoc jsonb); CREATE TABLE demo=# CREATE INDEX idxgin ON api USING GIN (jdoc); CREATE INDEX demo=#
CTEs
CTEs can split complex queries into multiple small portions
- CTEs can be used
RECURSIVE
over row's own value. TIP: It is iterable, not recursion.
- B-Tree: Use for operations like
<
OR<=
OR=
OR>=
OR>
together withBETWEEN
,IN
,IS NULL
orIS NOT NULL
- Hash: Use for
=
operator. ie: UUID - GiST: A combination of index algorithms. Use for ...
- GIN: Inverted index. Use for text-search, ranking ...
- BRIN: Best for repeating data sets (dates, integers, etc)
-
SQL Dump & Restore
- Exporting:
man pg_dump
(also see pg_dumpall) - Importing:
man pg_restore
(orpsql {DB_NAME} < dump.file
). Don't forget toANALYZE
after imports!
- Exporting:
-
File system level backup
-
Continuous archiving => use cloud :)
psql [-h .. -d .. -U .. -P ..]
OR psql 'dbname=.. user=.. application_name=..'
OR APPNAME=.. PGPASSWORD=.. PGHOST=.. psql
\?
-> List of all available apis (likehelp
)\d[+]
-> show table definition\di
-> list indexes\di <schema>.<tablename>
for table specific\dp
-> list access priviliges in tables, views, functions etc\g
-> equavalant to;
\sf
-> show function definition script\sv
-> show view definition script\g
-> equavalant to;
\q
-> quit\i <filename>.sql
-> execute sql file from the session\o <filename>
-> send all query results to specified file\copy ...
-> perform sql copy for data ingestion/extraction
Relational databases write data to disk in a row-based aproach
Row Based
Saved as tuples of row into disk. Needs a scan on disk in complex queries.
select value from data where name like 'fra%';
will first fetch all the table data to memory. This is due to storage structure undertheneeth.
Name | City | ... | Some Value |
---|---|---|---|
Frank | Berlin | ... | 123 |
... | ... | ... | ... |
Franz | Potsdam | ... | 321 |
Column Based
Each column saved to disk as a block. Thus, compression is highly possible.
Also, select value from data where name like 'fra%';
will fetch only data of value
column.
Name | City | ... | Some Value |
---|---|---|---|
Franz | Potsdam | ... | 125 |
Frank | col2_val | ... | 222 |
Henrik | col2_val | ... | 121 |
... | ... | ... | ... |
Zeynep | Bitlis | XYZ | 111 |
Example
- ClickHous: Fully opensource analytics data capable of petabyte scale queries in seconds.
- Citus Data: Fully open source Postgresql extension with distributed capacity.
- Delegate heavy-lifting to database server
- Minimize network latency, computing time & costs
- Reduces compelxity
- Rethink your data design! Use jsonb, generated-columns and other features
- Use columnar data storage for long-term queries (like over-yearly calculations)
HELPER QUERIES
Basic JOIN Examples
Table Functions
Columnar DB Example (Citus Data)
Citus is an fully opensource Postgresql extension maintained by Microsoft for distributed PG.
docker run --name columnar_db -e POSTGRES_PASSWORD=skecret -e POSTGRES_DB=demo -e POSTGRES_USER=admin -p 1111:5432 citusdata/citus:11.0 docker exec -it columnar_db psql -U admin -d demo
WHERE
Columnar ShinesEven though everytinh in both tables is almost identical (only value contens are different, sizes & types are same), normal table answers in 1321 miliseconds but columnar only takes 20!
A factor of 66 times faster.
B-tree Indexes
Below is a demonstration of aa query with and without index.
Look carefully at the query plan of second (inclusive) index!
BRIN Index