Skip to content

Instantly share code, notes, and snippets.

@hgmnz
Created August 5, 2009 12:32
Show Gist options
  • Save hgmnz/162668 to your computer and use it in GitHub Desktop.
Save hgmnz/162668 to your computer and use it in GitHub Desktop.
# PostgreSQL Notes
For first time Postgres users, the tutorial section on the documentation is a must read: http://www.postgresql.org/docs/8.4/interactive/tutorial.html
### Installation
Easiest way is to use your package manager (yum, apt-get, etc). On MacOS, macports made it easy. Building the source is straightforward as well. There are plenty of resources online...
### Get acquainted with psql
To enter the Postgres command prompt
psql -U your_username -d the_database
If there is a user in your database that matches your unix username, the -U parameter is optional (it will use your username). On development machines it is a good idea to create the matching postgres user. In that case, it is not needed to specify a username and password on database.yml.
From within a rails app with a database.yml configured, script/dbconsole works as expected (drops you into psql).
Within the psql command prompt you can list databases with \l, describe database objects using \d, connect to remote databases using \c, and many other useful tasks. Don't be shy about using \? for help, and \h <command> for help on a specific command.
A useful trick is to use \x before running a query that will return many columns. \x activates expanded mode. Try it.
### Rubygem/postgres adapter.
There are many adapters in the wild, some native and some pure ruby. The one that is known to work for me both on Mac OS and on Linux is the following. In the dark windows days, easiest way is to use the pure ruby driver, although with some work the native driver can also be installed (involved moving DLLs from the postgres installation directory to the ruby bin directory, if I remember correctly).
gem list --local |grep postgres
postgres (0.7.9.2008.01.28)
### Create db_user role (user)
sudo -u postgres createuser --no-superuser --createdb --no-createrole --login --pwprompt --encrypted -h 127.0.0.1 -p 5432 db_user
## Restore Database
### Create an empty database from template0.
Via SQL (ie: in a psql prompt)
create database db_name owner=db_user template=template0 encoding='utf8';
Via createdb
createdb -U postgres -O db_user -T template0 -E 'utf8' db_name
### Drop a DB
dropdb -U postgres db_name
## Backup Database
### Dump Entire Database to Custom Format
pg_dump -U postgres -Fc -v -f db_name-20081107.bak db_name
Optional: -Z9 (maximum compression ratio)
### Dump users and people to Custom Format
pg_dump -U postgres -t users -t people -Fc -v -f db_name-user_and_people-20081107.bak db_name
### Restore from custom format.
pg_restore -U postgres -d db_name -v db_name.bak
pg_restore also takes options for restoring specific objects, if the database dump uses the custom format (-Fc)
### Size of Database
select pg_size_pretty( pg_database_size('db_name') );
### Size of a table and its indexes
select pg_size_pretty( pg_total_relation_size('some_table') );
### Troubleshooting
On Linux, if the server will not start, we can force the standard error to be redirected to standard output in order to troubleshoot properly:
sudo -u postgres pg_ctl -D /var/lib/pgsql/data start
This is particularly useful when tweaking the shared_buffers parameter. If postgresql doesn't start, usually shmmaxm must be increased, and the "request size" on the error log will tell you exactly what this kernel param must be set to.
### DBLink
DBLink allows you to execute queries on a remote database. It is part of the contrib modules.
#### DBLink Installation
First, make sure you've installed the postgresql-contrib package. This includes dblink.sql which is installed by doing:
psql db_name -U postgres < /usr/share/pgsql/contrib/dblink.sql
#### Querying via DBLink
Unfortunately, all column types of a query must be specified. For example:
select users.*
from dblink('dbname=db_name host=<remote_ip_addr> password=<remote_password> user=<remote_user>', 'select * from users')
as users(id integer, login varchar(255), email varchar(255),
crypted_password varchar(40), salt varchar(40), created_at timestamp,
updated_at timestamp, remember_token varchar(255),
remember_token_expires_at timestamp, person_id integer,
user_status_id integer);
#### Moving data from a remote server to the local DB
This is as simple as an insert ... from ..., only the from portion comes from a remote server using dblink:
insert into users select remote_users.*
from dblink('dbname=db_name host=<remote_ip_address> password=<remote_password> user=<remote_user>', 'select * from users')
as remote_users(id integer, login varchar(255), email varchar(255),
crypted_password varchar(40), salt varchar(40), created_at timestamp,
updated_at timestamp, remember_token varchar(255), remember_token_expires_at timestamp,
person_id integer, user_status_id integer);
### Basic Server Setup on Linux
This section is mostly useful for setting up a server for production use:
Set password
Change the PostgreSQL postgres user password, we will not be able to access the server otherwise. As the "postgres" Linux user, execute the psql command:
sudo -u postgres psql template1
At the new prompt, change the password
ALTER USER postgres WITH PASSWORD 'new_password';
\q
### Update postgresql.conf File
Set the listen_address option to '*' if this server will be accessed from other hosts.
### Performance tuning
Increase "System V IPC" shared memory min and max on linux kernel:
See the following:
http://www.postgresql.org/docs/current/static/kernel-resources.html
As a rule of thumb, set the shmmax value to 1/4 of the available memory on the system, in bytes. If this is a dedicated postgres server, set to half the RAM.
For example, server has 4GB of RAM:
4GB/4 = 1GB = 1024MB = 1048576 KB = 1073741824 B
sudo /sbin/sysctl -w kernel.shmmax=1073741824
sudo /sbin/sysctl -w kernel.shmall=2097152
edit the following file, and add the two parameters above (kernel.shm*=*)
/etc/sysctl.conf
To check what these values are set to:
cat /proc/sys/kernel/shmmax
cat /proc/sys/kernel/shmall
For shared_buffers, start with 1/4 of available RAM. cache_miss statistics can tell if you need more
more buffers needed for many concurrent queries and many CPUs.
shared_buffers: 1/4 RAM.
RAM available for queries:
effective_cache_size: set it to 2/3 the RAM on the server (a normal conservative setting).
The less max_connections, the better (each connection allocates a chunk of RAM). Think of the need for this server:
max_connections = 32
Turns of WAL logging. Risky, but improves performance (don't do this until we've thoroughly tested) - if server crashes (unlikely), there will be no log available to restore latest transactions.
fsync=off #trying this on a staging server for now.
Write less to the WAL
checkpoint_segments = 64 (16 = checkpoint every 512MB. For bulk loading or DW, settings of 64/1GB or over are common)
checkpoint_completion_target = 0.90 (aim to finish writing WAL by the time 90% of the next checkpoint is ready. Note that this only works for postgresql version >= 8.3)
checkpoint_timeout = defaults to 5min, can be increased too (I've used 25min successfuly)
How much statistics to collect, helps query planner to use indexes.
default_statistics_target = from 200 - 1000 for large databases
This setting is used almost exclusively by the vacuum process, but also for create index and adding foreign keys.
you can additionally use log_temp_files to figure out if sorts are using disk instead of fitting in memory.
This setting can be set at load time, like work_mem
Larger values are common while performing bulk updates or DB restores.
maintenance_work_mem = 1/4 of your RAM # for version >= 8.3
maintenance_work_mem = 1/8 of your RAM # for version < 8.3
Default work_mem value is a whimpy 1MB. Tells postgres how much memory is available before creating temp files during sorts
for queries that require significant sorting and hash operations.
You may set this value on the fly, for example:
<code>
CONNECT
set work_mem = 1024MB
select distinct * from ... order by ...
DISCONNECT
</code>
In general, OS swapping == too much work_mem, while caching sorts in pg_temp == not enough work_mem.
work_mem = 32MB
Which method to use for commits to disk of WAL. This is not conservative, but popular.
wal_sync = open_sync
How much memory used in shared memory for WAL data:
wal_buffers = 32MB # do this for SMP systems (and otherwise) and DW with 4GB on server. OLTP => 16MB
this setting is all about how expensive it is to for your disks to seek a random disk page. Lower it to 2.5 to encourage the query optimizer to use index scans. The default of 4.0 is too conservative for modern hardware.
random_page_cost = 2.5
This effectively double checks data log writes. Turn off if you have reliable hardware and filesystem
full_page_writes = false/off
Support "batch commits", for applications with many concurrent writers
commit_delay = 250 # I haven't tried this.
deadlock_timeout = 2000
Minimize logging on production servers
log_error_verbosity = terse #writes lss than default
log_min_messages = warning for dev servers, error for prod servers.
log_statement = 'none' #don't log queries on prod servers
syslog_facility = 'LOCAL2'
syslog_ident = 'postgres'
log_connections = off
log_disconnections = off
log_duration = off
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment