philandstuff/russ-garrett.org

## russ-garrett.org

      
    Raw
  

              russ-garrett.org
            
          
    russ garrett on postgres


  I like it as a default database when I don’t know what database I want to use

schemas


  I like schemas and I hate schemaless databases
  it’s useful to be able to enforce your schema before the data
    gets written to disk
    
      rather than in your ORM!
      CHECK constraints: ensure balance >= 0
    
  
schema changes often aren’t great


  this is why people don’t like schemas
  in mysql, adding a column to a table locks the table while it
    adds the column to every row

postgres makes this better


  zero-downtime schema changes
  all schema changes are transactional
  some schema changes are O(1)
    
      adding a nullable column without a default
      dropping column
      some trivial column type changes (varchar to text, small int
        to big int)
    
  
  CREATE INDEX CONCURRENTLY
    
      not strictly concurrent
    
  
  some dancing to add a non-nullable column
    
      add nullable nondefaulted column
      set default on column
      update all rows (takes a while, but doesn’t block writes)
      remove nullable from column
    
  
schemaless


  can use hstore, xml, or json for column types
    
      hstore since time immemorial
      xml since 8.3
      json since 9.3
    
  
querying json


  SELECT name FROM articles WHERE data->>’type’ = ‘post’;
  SELECT data->’keywords’->>0 FROM articles;
  this is quite slow because it has to parse the json in every row

indexing json fields


  CREATE INDEX articles_data ON articles USING GIN(data);
  or
  CREATE INDEX articles_data ON articles(data->>’type’);

code in the database


  0:12:07
postgres has loads of stored procedure languages
    
      0:12:38
pgSQL, tcl, perl, python (included)
      0:12:47
extensions: js, ruby, scheme, java, php
    
  
  0:13:34
<example javascript stored procedure>

backups


  0:15:05
log replication can be used for automated incremental
    backups
  0:15:53
always less than 5 minutes old (though can be
    configured to be even younger)

WAL (write ahead log) archiving using WAL-E


  0:17:41
automates base backups
  0:17:58
saves log segments to S3 or other block stores
  0:18:13
compresses (LZO) and encrypts (GPG)
  0:18:25
automates segment retrieval
  0:19:12
having incremental WAL backups allows point-in-time recovery
    
      0:19:36
eg to right before someone ran an UPDATE without a WHERE
    
  
slaves


  0:20:34
“cold slave” reads log incremental backups
  0:21:01
separate setting to allow reads from slaves
  0:21:12
cold slave has same guarantees – less than 5 minutes old
  0:21:22
can do streaming replication instead
  0:21:30
synchronous replication: postgres waits for at least
    one of the slaves to write to disk before completing
    a commit (slow unless they’re on the same network)

sharding and clustering


  0:23:38
all the good clustering solutions for postgres are
    commercial
  0:24:28
if you have have a dataset which is huge, postgres is
    not the right database
  0:24:42
if it’s merely large, postgres on a beefy server
    might be easier in the long run than a distributed
    solution

GIS and such


  0:25:10
geographic querying
  0:25:18
PostGIS is great, but it’s a bit weird
  0:25:24
SELECT AddGeometryColumn('tube_stations', 'geom', 4326, 'POINT', 2);
    
      0:26:02
^ wtf. this is how you add a geometry-aware column
    
  
  0:27:05
INSERT INTO tube_stations(name, geom) VALUES ('Holborn', ST_GeomFromTest('POINT....))
  0:27:48
supports indexes:
  0:27:48
CREATE INDEX tube_stations_geom_gist ON tube_stations USING GIST (geom);
  0:28:51
index enables queries of “what’s in this box”?
  0:29:23
PostGIS is also supported by other GIS applications

credits


  0:30:00
twitter: @russss
  0:30:08
http://russ.garrett.co.uk/talks/postgres-gds

q&a


  0:32:21
what’s the failover process for synchronous replication?
    
      0:32:33
automating it is hard
      0:32:52
once something comes out of replication mode, it
        enters a different timeline
    
  
  0:33:43
where does postgres need resources? where would you
    spend money on postgres?
    
      0:34:05
a lot of OSS contributors are from companies with
        money to throw at postgres
      0:34:30
I would say clustering, but it’s such a huge thing
        that I don’t know how you’d go about running that
    
  
  0:38:42
user model & authentication model: is there an easy
    way to explain it? what do you end up doing in a prod
    system?
    
      0:39:07
once you have the knack for it, it’s not that hard
        to comprehend but it is a bit weird
      0:39:22
why? it’s at heart a fairly old school unix
        database
      0:39:39
you have to configure the db user and then
        configure their access
      0:40:48
the default configuration in most distros is a bit
        of a faff
    
  
  0:41:03
is vacuuming still painful?
    
      0:41:10
don’t let txns run for a long time
      0:41:17
they’re especially bad in postgres because it keeps
        the state of the db as it was at the start of txn
      0:42:09
other than that, autovacuuming should take care of everything.