Skip to content

Instantly share code, notes, and snippets.

@ilhnctn
Created December 6, 2020 12:20
Show Gist options
  • Save ilhnctn/ec0f496659a0dab309d201708ec25c02 to your computer and use it in GitHub Desktop.
Save ilhnctn/ec0f496659a0dab309d201708ec25c02 to your computer and use it in GitHub Desktop.
Text content of Stefan Frost's Postgresql Performance Talk

Identify Slow Queries

See talk

  • use application_name in qs (if more than one app connects to db)
  • Native Partitioning is better than pg_partman

Logging

=> The more you log, the better you locate performance battlenecks. Also the mor you log, the more you slow down your system.
=> pgbadger, pg_stat_statements (what is currently happening)

Settings (reload-only):

  • log_min_duration_system = {x: time in ms} => log every query longer than x (don't use log_statement, log_duration)

  • log_line_prefix:

  • log_locks_wait

  • log_temp_files => log every disk-based process

  • log_autovacuum_min_duration

  • Use pgbadger (to analyze logs)

Configuration

  • work_mem

  • maintainance_work_mem

  • effective_cache_size

  • shared_buffers

  • CHECKPOINT_SEGMENTS

  • {min,max}_wal_size

  • checkpoint_timeout

  • chekpoint_completetion_TARGET

  • auto_explain

    • Logs explain for query based on legth of time
    shared_preload_libraries = 'auto_explain'
    explain.log_min_duration = 50; # ms
    explain.log_nested_statements = true
    

Note: explain analyze is also same with the settings above but too expensive. explain.depesz.com

Vacuum

  • Marks records as reusable (kind of garbage collector)
  • Bloat: dead tuples. Also for indexes too

Retrieving Data

  • Seq. Scan
  • Bulk Operation
  • Bitmap Scan

=> Use index as much as possible Index-only scan: ?

Joins

  • Nested Loop

  • Merge Join

    • Order each table
    • Walk trhough both, return matches
    • Good for bulk
    • Sorting is expensive
  • Hash Join:

    • Scan One table and build a hash table
    • Step through other table using hash table
    • Slow start
    • Very fast but memory intensive

Aggregates

  • Group Agg
    • Order / Sort Input
    • Step through eac record, if matches, combine
    • Expensive Sort
  • Hash Agg
    • Scan Table, building Hash tbl
    • Hash tablematches are combined
    • Memory instensive

Query Plan

Database gathers & uses statistics

  • ANALYZE
  • VACUUM ANALYZE
  • pg_statistics
  • Autovacuum

Solution

  • Indexes
    • Postgresql Index Types & where to use
  • Increase work_mem (can be configured on query)
  • ANALYZE tables (to fix the stats & indexes)
  • Prepared Queries
    • Avoids repeated planning cost
    • First five times rule: if the first 5 time returns the same plan, continue with it
  • Don't use select (*) (especially if you have TOASTED columns)
  • Prefer JOINS.
  • Don't use NOT EXISTS if possible
  • Use CTE (Common Table Expression): If CTE part is expensive to pull but will be small in size absolutely use it :)
  • For really fast count(*) use pg_statistics (will not be exact count). pg_class.reltuples
  • Use partial (conditional) indexes that eliminate null or non-related
  • Make sure your query-plan uses your index
  • Drop un-used indexes
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment