Skip to content

Instantly share code, notes, and snippets.

@brucecrevensten
Created August 1, 2012 22:22
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save brucecrevensten/3231238 to your computer and use it in GitHub Desktop.
Save brucecrevensten/3231238 to your computer and use it in GitHub Desktop.
Unboxing PostgreSQL - OSCON 2012 - Notes

Postgres configuration and ops considerations -- notes taken during OSCON 2012

Presented by Christophe Pettus.

Authors site: thebuild.com

Nomenclature

  • database is set of tables & schema objects,
  • mysql "databases" === postgres "Schema"
  • "cluster" means "collection of databases"
  • pg_ctl: start/stop
  • $PGBASE
  • postgresql.conf
  • security: "role" === object that can own other objects and that has privileges
  • "user" === role that can log into the system; otherwise synonyms
  • important params in config: logging, memory, checkpoints, planner; done.

Tuning

  • use SSDs; put the transaction log onto spinning disk if you want. use server-grade for graceful memory flush in case of power failure.

Tuning logging

  • change logging first to get the data you need:
  • where to log? syslog, or, use CSV format to files.
  • log_min_duration_statement = 250ms <<--- important!for finding slow things
  • log_lock_waits = on <-- if anything's waiting for locks, log

Tuning memory

memory configs:

  • dont' run other servers on the same thing(?)
  • shared_buffers -- best == 8GB >= 32GB, otherwise 25% total system.
  • look for SHMALL and SHMMAX in the syslogs in the kernel -- tweak kernel params: -- calculate shared_memory in bytes + 20%. huge decimal number. -- sudo sysctl -w kernel.shmmax = (value) -- sudo sysctl -w kernel.shmmall = (value) / 4096

work_mem:

  • start low: 32-64MB.
  • look for lines about "temp files," to see if there's needing to be more temp memory.
  • set to 2 - 3x largest temp file you see. (why bigger? more efficient on disk, need more actual hot RAM).
  • dont' exceed 5-10% of systems RAM, because it's the amount of memory per planner node.
  • can configure per-session

maintenance_work_mem

  • 10% of sysram up to 1GB
  • if vaccuum problems, maybe need more.

effective_cache_size

  • set to amount of file system cache available.
  • no idea what to set? set it to 50% system RAM
  • this isn't allocated, it's a hint to the planner to decide how much RAM is available.

General points:

  • random_page_cost && work_mem === biggest performance gains
  • prefer xfs or ext4 on linux.

Checkpoint tuning

a complete flush or dirty buffers to disk. two params control this:

  • number of WAL segments written to disk,
  • whenever a timeout occurs.

tuning these parameters:

  • wal_buffers # 16mb
  • checkpoint_completion_target # 0.9
  • checkopoint_timeout # 10m-30m # longer to start, less IO while operating
  • checkpoint_segments # 32

monitoring:

  • look for checkpoint entries in log,
  • happening how often? more than checkpoint_timeout? if so, it's exhausting the WAL segments more quickly, so bump up the checkpoint segments until they're less frequent than the timeout

Planner tuning

  • effective_io_concurrency: set to number of IO channels, otherwise ignore; if SSD with 32 channels, set to 32, etc. random_page_cost - 3.0 is typical RAID10, 2.0 for SAN, 1.1 for AMazone EBS. Radio between time to grab random disk page vs. sequential. SSD ~ 1.5 range. controls index v. sequential search. lower#use index.

do not touch

  • fsync = on. never change. this controls if postgres will flush then wait for the result.
  • synchronous_commit = on; you can turn this off (won't corrupt DB if fail);

Concepts

Write-ahead log (WAL)

  • continuous stream of comitteed database modifications, broken into 16mb segments
  • starts with DB cluster creation, lasts forever
  • checkpoints mean "last consistent state", segements before that WAL can be thrown away
  • PUT IT ON ITS OWN FILE SYSTEM because it's append-only, basically. own set of disks, etc. it stays put.

MVCC (multiversion concurrency control)

  • helps prevent locking, alternative to pessmisitic locking; allows higher performance
  • writers don't block readers, readers get old version of row.
  • writers block writers to the same row.
  • multiple versions of row may be in DB; deleted/updated aren't immediately removed.
  • VACUUM cleans tuples(rows) that aren't seen by anything/anyone/any transaction.
  • post 8.0, autovacuum happens; good idea to do manual vacuum after bulk update/delete operations
  • ANALYZE regenerates table stats to help make good guesses for how to execute queries; always do this after major database changes, such as restore from bacup.
  • "share" vs "exclusive" locks exist.
  • surprising locks: table-level locks when you add a new non-NULL column -- fix by creating it NULL then changing the column later

Transaction modes

  • read committed
  • repeatable read
  • serializable

Schema design & operations

  • keep data in normal form, don't fear joins
  • "fast/slow" rule: "fast data" changes a lot, "slow data" infrequently -- put these into separate tables. Slow data tends to be the parent of other data via foreign keys.
  • some indexing strategies: -- index should be selective in the sense that when the index is used, it should return a small number of rows. -- partial index: an index that only applies to certain/specific conditions (along the lines of "index where(clause)") -- index should be frequently used.
  • drop unused indexes. create indexes on the basis of real-life needs, and look for sequential scans
  • built-in views to check indexes: -- pg_stat_user_tables -- how many times a sequential scan has been done, -- pg_stat_user_indexes -- how often an index has been used.
  • SELECT COUNT(*) from myHugeTable is implemented as a full table scan. Don't do it. pg_stat_activity has an approximate, but try and avoid it. It's not a fast-performant thing on Postgres.
  • taming autovacuum -- you can cut down the number of workers, making it run more frequently, etc. other sections in the configuration file.
  • bulk loading: use COPY, not INSERT.

Debugging

"this query is slow"

  • EXPLAIN or EXPLAIN ANALYZE -- gets the query plan
  • http://explain.depesz.com
  • estimate vs. actual rows return means planner's confused
  • nested loops often mean joins that you can't use an index for

"the DB is slow"

  • pg_stat_activity -- is it waiting on a query? etc
  • tail -f logs
  • pg_locks, in connection with pg_stat_activity.

System/network

  • cloud hosting has terrible IO; since DBs are IO bound, you want to get as much RAM as you can (up to 2x DB size), CPU capacity isn't that important as RAM; always replicate.
  • store configurations in VCS
  • our-own-hardware: -- get SSDs, otherwise SAS drives.
    -- RAID10; -- put pg_xlog on its own volume; -- move pg_stat_tmp to a RAMdisk if you want to (transient data, write intensive)

if you have little SSD, put your busy indexes / tables onto it.

monitoring

  • nagios: disk, cpu, mem, (if used) replication log
  • "checkpostgres.pl" from Bucardo.org <- use
  • pgAdmin3 for management, handy
  • log analysis: pgFouine (Traditional, not maintained much); pgbadger (new, active).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment