Skip to content

Instantly share code, notes, and snippets.

@ilhnctn
Last active September 11, 2022 14:46
Show Gist options
  • Save ilhnctn/8486066bbd9f92317a1ba7fd3f6284f4 to your computer and use it in GitHub Desktop.
Save ilhnctn/8486066bbd9f92317a1ba7fd3f6284f4 to your computer and use it in GitHub Desktop.

SQL with Postgresql

Agenda

  • Local env setup
    • Bonus: CloudSQL methods
  • Configuration: most useful configs for balanced performance
  • Basics of query dynamics:
    • How a Query Works
    • JOINs
    • CTEs
    • Transactions
    • Functions
    • Window Functions
    • Aggregations
    • Data Generation
    • Constraints
  • Useful Queries For Daily Business
    • Some of main 4-Cast queries
  • Performance Hints: How to explain the Queries
    • Indexes
    • Statistics, Vacuum, Analyze, Re-Index
  • Maintenance:
    • Backup & Restore
    • Storage: All About Sizes (see: link)
    • Upgrades
    • Migrations
  • Bonus:
    • Transactions
    • Isolation Levels
    • psql Tips
    • Columnar

Local env setup

➜ docker run --name demo_server -e POSTGRES_PASSWORD=secret -e POSTGRES_DB=demo -e POSTGRES_USER=admin -p 54455:5432 postgis/postgis:14-3.2-alpine -d

# overwrite `application_name` to enable transparant monitoring
➜ docker exec -it demo_server psql -U admin -d 'dbname=demo application_name=ilhan_cli_app'

# OR
➜ docker -e PGAPPNAME=ilhan_cli_app exec -it demo_server psql -U admin -d demo

CloudSQL

Get cloud proxy binaries from link

# Expose database to local access
PRROJECT_ID=... \
REGION=... \
INSTANCE_NAME=... \
./cloud_sql_proxy -instances=${PRROJECT_ID:-myproject}:{REGION:-myregioon}:{INSTANCE_NAME:-myinstance}=tcp:54552
  • Use clone functionality to have an exact copy of current database
  • Run experimentsorheavy qureies (over yearly reporting queries) against clone db
  • Delete clones as soon as you finished your experiment

Configuration

Use SHOW all; to get all available configurations.

Important values:

  • effective_cache_size
  • max_connextions
  • maintenance_work_mem
  • statement_timeout
  • log_min_duration_statement
  • max_locks_per_transaction

Recommendations

  • Use ~/.psqlrc for personalized values
  • Use LOCAL setting changes for transaction based changes
  • SET ... is only effectivfe for current session
  • Database Connections
    • Keep max_connextions low
    • Use pg_stat_* for overviews. PS: Google CloudSQL's query-insights also uses pg_stat_activity

SQL Query Dynamics

  • Query Execution Order: Top to down execution order => FROM, JOIN -> WHERE -> GROUP BY -> HAVING -> SELECT -> ORDER BY -> DISTINCT [ON] -> LIMIT
  • When more than one table reference is listed in the FROM clause, the tables are cross-joined (the Cartesian product)
  • Table Functions can be handy for query re-use

JOINs

JOIN clauses nest left-to-right by default. Parentheses can be used to control the JOIN order

  • CROSS JOIN: For every possible combination of rows from T1 and T2 a row consisting of all columns in T1 followed by all columns in T2. T1(N rows), T2(M rows) => N * M rows.
  • FROM T1 CROSS JOIN T2 is equivalent to FROM T1 INNER JOIN T2 ON TRUE. It is also equivalent to FROM T1, T2
  • The words INNER and OUTER are optional in all forms.
  • INNER is the default.
  • LEFT, RIGHT and FULL imply an OUTER JOIN.
  • JOIN T1, T2 USING (a, b) is equevalant to .. ON T1.a = T2.a AND T1.b = T2.b. This is handy when joined column names are same.
  • NATURAL is a shorthand form of USING: it forms a USING list consisting of all column names that appear in both input tables.
  • USING is safe for schema changes, NATURAL raises conflicting risks

Query Generics

  • q1 UNION [ALL] q2, q1 INTERSECT [ALL] q2 and q1 EXCEPT [ALL] q2 can be used to combine result of two queries
  • Ordering (like ORDER BY sort_expression1 [ASC | DESC] [NULLS { FIRST | LAST }]) happens at the end of data fetching and is one of the most expensive operations
  • VALUES (VALUES (1, 'one'), (2, 'two'), (3, 'three')) can be used to generate intermediate tables to ease queries
  • RETURNING * give control over output of any query
  • Use range data types when applicable (see link)

demo=# create table api(jdoc jsonb); CREATE TABLE demo=# CREATE INDEX idxgin ON api USING GIN (jdoc); CREATE INDEX demo=#

CTEs

CTEs can split complex queries into multiple small portions

  • CTEs can be used RECURSIVE over row's own value. TIP: It is iterable, not recursion.

Performance: Indexes

  • B-Tree: Use for operations like < OR <= OR = OR >= OR > together with BETWEEN, IN, IS NULL or IS NOT NULL
  • Hash: Use for = operator. ie: UUID
  • GiST: A combination of index algorithms. Use for ...
  • GIN: Inverted index. Use for text-search, ranking ...
  • BRIN: Best for repeating data sets (dates, integers, etc)

Maintenance: Backup & Respore

  • SQL Dump & Restore

    • Exporting: man pg_dump (also see pg_dumpall)
    • Importing: man pg_restore (or psql {DB_NAME} < dump.file). Don't forget to ANALYZE after imports!
  • File system level backup

  • Continuous archiving => use cloud :)

Bonus

psql

psql [-h .. -d .. -U .. -P ..] OR psql 'dbname=.. user=.. application_name=..' OR APPNAME=.. PGPASSWORD=.. PGHOST=.. psql

  • \? -> List of all available apis (like help)
  • \d[+] -> show table definition
  • \di -> list indexes \di <schema>.<tablename> for table specific
  • \dp -> list access priviliges in tables, views, functions etc
  • \g -> equavalant to ;
  • \sf -> show function definition script
  • \sv -> show view definition script
  • \g -> equavalant to ;
  • \q -> quit
  • \i <filename>.sql -> execute sql file from the session
  • \o <filename> -> send all query results to specified file
  • \copy ... -> perform sql copy for data ingestion/extraction

Columnar

Relational databases write data to disk in a row-based aproach

Row Based Saved as tuples of row into disk. Needs a scan on disk in complex queries. select value from data where name like 'fra%'; will first fetch all the table data to memory. This is due to storage structure undertheneeth.

Name City ... Some Value
Frank Berlin ... 123
... ... ... ...
Franz Potsdam ... 321

Column Based Each column saved to disk as a block. Thus, compression is highly possible. Also, select value from data where name like 'fra%'; will fetch only data of value column.

Name City ... Some Value
Franz Potsdam ... 125
Frank col2_val ... 222
Henrik col2_val ... 121
... ... ... ...
Zeynep Bitlis XYZ 111

Example

Potensial Candidates

  • ClickHous: Fully opensource analytics data capable of petabyte scale queries in seconds.
  • Citus Data: Fully open source Postgresql extension with distributed capacity.

Take-aways

  • Delegate heavy-lifting to database server
  • Minimize network latency, computing time & costs
  • Reduces compelxity
  • Rethink your data design! Use jsonb, generated-columns and other features
  • Use columnar data storage for long-term queries (like over-yearly calculations)
@ilhnctn
Copy link
Author

ilhnctn commented Sep 11, 2022

Also see helper-queries

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment