Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
intro to document stores in postgreSQL

Document stores in PostgreSQL

Notes for software engineering meeting presentation

What

  • mid to late 2000s: appearance of Document stores / NoSQL databases such as Mongo, Couch
  • Relational DBs now have support for document data: JSON in MySQL, JSON and JSONB in PostgreSQL
  • Focus on JSONB in Postgres (most full featured)

How

  • Implemented as a data type - i.e. a column in a regular table
  • Adding document store functionality to your relational database (not choosing one or the other)

Adding JSONB data

create table:

CREATE TABLE people id SERIAL PRIMARY KEY data JSONB;

get some json data:

jsondata = '{
	"firstname": "Glen",
	"lastname": "Newton",
	"tags": "['manager ', 'vegetarian ', 'orange_pants ']"
}'

insert (checks for valid JSON):

INSERT into people (data) VALUES (jsondata);

Querying

There are special operators and functions for querying JSONB data. See postgres docs for details.

  • get by key (-> operator):

    SELECT data->'firstname' FROM people;

  • search for key/value (@> operator):

    SELECT id FROM people WHERE data @> '{"tag":["orange_pants"]}';

  • get as text (->> operator):

    SELECT id FROM people WHERE data->>('lastname') ilike ('%New%');

  • get all of the keys:

    SELECT jsonb_object_keys(data) FROM people;

  • get value:

    SELECT jsonb_extract_path(data,'firstname') FROM people;

  • get json in a pretty way:

    SELECT

Indexing

You can index your JSONB column to efficiently search key / value pairs using the GIN operator. To index all top-level keys and key/value pairs:

CREATE INDEX idxgin ON people USING GIN (data);

The less well-structured your json, the bigger the index! Create index on a specific column if that search is going to be common:

CREATE INDEX idxgintags ON people USING GIN ((data -> 'tags'));

When

When should you use JSONB in your relational database?

Example project:

  • opentreeoflife phylogeny index otindex
    • already have data in json format (in separate github repo)
    • database functions as an index only not the canonical data store
    • some aspect of json very standardized and complete - these parts we extract and put into specific columns
    • json also contains sparse data (many values missing for many keys)
    • want to easily pick up new keys
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment