Skip to content

Instantly share code, notes, and snippets.

@philandstuff
Created March 7, 2014 15:53
Show Gist options
  • Save philandstuff/9414102 to your computer and use it in GitHub Desktop.
Save philandstuff/9414102 to your computer and use it in GitHub Desktop.
russ garrett talking about postgres

russ garrett on postgres

  • I like it as a default database when I don’t know what database I want to use

schemas

  • I like schemas and I hate schemaless databases
  • it’s useful to be able to enforce your schema before the data gets written to disk
    • rather than in your ORM!
    • CHECK constraints: ensure balance >= 0

schema changes often aren’t great

  • this is why people don’t like schemas
  • in mysql, adding a column to a table locks the table while it adds the column to every row

postgres makes this better

  • zero-downtime schema changes
  • all schema changes are transactional
  • some schema changes are O(1)
    • adding a nullable column without a default
    • dropping column
    • some trivial column type changes (varchar to text, small int to big int)
  • CREATE INDEX CONCURRENTLY
    • not strictly concurrent
  • some dancing to add a non-nullable column
    • add nullable nondefaulted column
    • set default on column
    • update all rows (takes a while, but doesn’t block writes)
    • remove nullable from column

schemaless

  • can use hstore, xml, or json for column types
    • hstore since time immemorial
    • xml since 8.3
    • json since 9.3

querying json

  • SELECT name FROM articles WHERE data->>’type’ = ‘post’;
  • SELECT data->’keywords’->>0 FROM articles;
  • this is quite slow because it has to parse the json in every row

indexing json fields

  • CREATE INDEX articles_data ON articles USING GIN(data);
  • or
  • CREATE INDEX articles_data ON articles(data->>’type’);

code in the database

0:12:07
postgres has loads of stored procedure languages
0:12:38
pgSQL, tcl, perl, python (included)
0:12:47
extensions: js, ruby, scheme, java, php
0:13:34
<example javascript stored procedure>

backups

0:15:05
log replication can be used for automated incremental backups
0:15:53
always less than 5 minutes old (though can be configured to be even younger)

WAL (write ahead log) archiving using WAL-E

0:17:41
automates base backups
0:17:58
saves log segments to S3 or other block stores
0:18:13
compresses (LZO) and encrypts (GPG)
0:18:25
automates segment retrieval
0:19:12
having incremental WAL backups allows point-in-time recovery
0:19:36
eg to right before someone ran an UPDATE without a WHERE

slaves

0:20:34
“cold slave” reads log incremental backups
0:21:01
separate setting to allow reads from slaves
0:21:12
cold slave has same guarantees – less than 5 minutes old
0:21:22
can do streaming replication instead
0:21:30
synchronous replication: postgres waits for at least one of the slaves to write to disk before completing a commit (slow unless they’re on the same network)

sharding and clustering

0:23:38
all the good clustering solutions for postgres are commercial
0:24:28
if you have have a dataset which is huge, postgres is not the right database
0:24:42
if it’s merely large, postgres on a beefy server might be easier in the long run than a distributed solution

GIS and such

0:25:10
geographic querying
0:25:18
PostGIS is great, but it’s a bit weird
0:25:24
SELECT AddGeometryColumn('tube_stations', 'geom', 4326, 'POINT', 2);
0:26:02
^ wtf. this is how you add a geometry-aware column
0:27:05
INSERT INTO tube_stations(name, geom) VALUES ('Holborn', ST_GeomFromTest('POINT....))
0:27:48
supports indexes:
0:27:48
CREATE INDEX tube_stations_geom_gist ON tube_stations USING GIST (geom);
0:28:51
index enables queries of “what’s in this box”?
0:29:23
PostGIS is also supported by other GIS applications

credits

0:30:00
twitter: @russss
0:30:08
http://russ.garrett.co.uk/talks/postgres-gds

q&a

0:32:21
what’s the failover process for synchronous replication?
0:32:33
automating it is hard
0:32:52
once something comes out of replication mode, it enters a different timeline
0:33:43
where does postgres need resources? where would you spend money on postgres?
0:34:05
a lot of OSS contributors are from companies with money to throw at postgres
0:34:30
I would say clustering, but it’s such a huge thing that I don’t know how you’d go about running that
0:38:42
user model & authentication model: is there an easy way to explain it? what do you end up doing in a prod system?
0:39:07
once you have the knack for it, it’s not that hard to comprehend but it is a bit weird
0:39:22
why? it’s at heart a fairly old school unix database
0:39:39
you have to configure the db user and then configure their access
0:40:48
the default configuration in most distros is a bit of a faff
0:41:03
is vacuuming still painful?
0:41:10
don’t let txns run for a long time
0:41:17
they’re especially bad in postgres because it keeps the state of the db as it was at the start of txn
0:42:09
other than that, autovacuuming should take care of everything.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment