- Just run PostgreSQL on the host
- Prefer bare metal when you have more serious DB needs
- Disable the Linux OutOfMemory killer (see #3)
- shared_buffers: "The shared_buffers configuration parameter determines how much memory is dedicated to PostgreSQL use for caching data." (source: #1)
- Below 2Gb of system memory (what kind of memory is that for a DB? But seriously...) then use 20% of it.
- Between 2-32Gb of system memory (no you are talking) then use 25% of it.
- Over 32Gb of system memory (whoa) then use 8Gb fixed. Done, let us move on.
- Caveat: "Even on a modern Linux system, the stock kernel will likely not allow setting shared_buffers to over 32MB without adjusting kernel settings first." (see #1)
- work_mem: This size is applied to each and every sort done by each user, and complex queries can use multiple working memory sort buffers.
- Caveat: In 8.3+ you can use log_temp_files to figure out if sorts are using disk instead of fitting in memory.
- Start low, say 32Mb then increase based on largest log_temp_files size. Then reset to 2 or 3 times that size.
- maintenance_work_mem: used for operations like vacuum.
- "Something in the 256MB range has anecdotally been a reasonable large setting here."
- Others suggest 10% of system memory not to exceed 1Gb.
- Only consider higher you encounter problems with VACUUM.
- effective_cache_size: should be set to an estimate of how much memory is available for disk caching by the operating system and within the database itself, after taking into account what's used by the OS itself and other applications.
- Common prescriptions are between 50% and 75% of system memory. (source #1)
- Or set to the amount of file system cache available. (source #2)
- Typically start out with 50% when starting out and base optimizations on evidence and/or realistic tests. (Source #2)
Suggestions for starting configs in #2.
wal_buffers = 16MB
checkpoint_completion_target = 0.9
checkpoint_timeout = 10m-30m # Depends on restart time
checkpoint_segments = 32 # To start.
Notes:
- "Adjust checkpoint_segments so that checkpoints happen due to timeouts rather filling segments." (see #2) Do this by looking for checkpoint entries in logs and checking if it is happening more often than the
checkpoint_timeout
. - The WAL (write-ahead-log) can take 48 x
checkpoint_segments
on disk. Make sure this is ok. - Restarting PostgreSQL can take up to
checkpoint_timeout
though typically less, so don't make it too long, just long enough.
- random_page_cost: "This setting suggests to the optimizer how long it will take your disks to seek to a random disk page, as a multiple of how long a sequential read (with a cost of 1.0) takes." (source #1)
// Silly Scala pseudo code for setting random_page_cost setting for PostgreSQL
// Suggestions from source #2
val randomPageCost = storageType match {
case RaidArray(10) => 3.0
case San => 2.0
case AmazonEBS => 1.1
}
Suggestion for CSV format output:
log_destination = 'csvlog'
log_directory = 'pg_log'
logging_collector = on
log_filename = 'postgres-%Y-%m-%d_%H%M%S'
log_rotation_age = 1d
log_rotation_size = 1GB
log_min_duration_statement = 250ms
log_checkpoints = on
log_connections = on
log_disconnections = on
log_lock_waits = on
log_temp_files = 0
(source: #2)
Great to know. I am slowly gathering research while we attempt to migrate from MySQL to primarily PostgreSQL at work (plus moving out non-relational data out to more relevant datastores), so we can start off on the write foot, so anything generic I am Gisting for others to use. :)
I have used PostgreSQL as a developer many times before, but always had a PostgreSQL DBA at work to refer to, so this time I have to delve deeper.