Skip to content

Instantly share code, notes, and snippets.

@lcowell
Last active September 15, 2017 23:22
Show Gist options
  • Save lcowell/920db2ff19d67d159a53dd461a71e150 to your computer and use it in GitHub Desktop.
Save lcowell/920db2ff19d67d159a53dd461a71e150 to your computer and use it in GitHub Desktop.

I've been looking at some lower level storage stuff for postgres and I was curious about when postgres was writing stuff to disk, specifically when working with updates or transactions.

The ctid represents the physical storage location of the row on disk. Something like (5,8) could be thought of as page 5 + (offset 8 x row width). By looking at this we can know the position of the last written row, which means we'll know if a new row is being written of if the row is being reused.

For example, the last row written to the users table is (5,8).

front_desk_development=# select MAX(ctid) from users;
  max
-------
 (5,8)
(1 row)

Time: 0.352 ms

If I insert a row in to the database, see how the page offset increases from 8 to 9.

front_desk_development=# insert into users ("email", "created_at", "updated_at") values ('luke@talesa.net', NOW(), NOW());
INSERT 0 1
Time: 2.611 ms
front_desk_development=# select MAX(ctid) from users;
  max
-------
 (5,9)
(1 row)
Time: 0.390 ms

Let's try it in a transaction and see at what point in the lifecycle we write out a new row. Let's begin the transaction and check that our ctid hasn't changed.

front_desk_development=# begin transaction;
BEGIN
Time: 0.171 ms
front_desk_development=# select MAX(ctid) from users;
  max
-------
 (5,9)
(1 row)
Time: 0.333 ms

Let's insert the row and see what happens. I was expecting that things weren't written out until the transaction is commited, but we can see that our offset has incremented.

front_desk_development=# insert into users ("email", "created_at", "updated_at") values ('luke@talesa.net', NOW(), NOW());
INSERT 0 1
Time: 0.452 ms
front_desk_development=# select MAX(ctid) from users;
  max
--------
 (5,10)
(1 row)
Time: 1.050 ms

As expected calling commit has no affect on the max ctid.

front_desk_development=# commit;
COMMIT
Time: 2.222 ms
front_desk_development=# select MAX(ctid) from users;
  max
--------
 (5,10)
(1 row)

Time: 0.378 ms

Let's try again by doing an insert inside a transaction and see what happens when we rollback.

front_desk_development=# begin transaction;
BEGIN
Time: 0.179 ms
front_desk_development=# insert into users ("email", "created_at", "updated_at") values ('luke@talesa.net', NOW(), NOW());
INSERT 0 1
Time: 3.475 ms
front_desk_development=# select MAX(ctid) from users;
  max
--------
 (5,11)
(1 row)

Time: 0.834 ms

As before our data is written to disk immediately.

front_desk_development=# rollback;
ROLLBACK
Time: 0.239 ms
front_desk_development=# select MAX(ctid) from users;
  max
--------
 (5,10)
(1 row)

Time: 0.486 ms

The highest available ctid has gone back to before we started the transaction. I'd think this is the equivalent of deleting a row. Let's see what happens when we delete the last row (highest id) of the table.

front_desk_development=# select MAX(ctid) from users;
  max
--------
 (5,10)
(1 row)

Time: 0.353 ms
front_desk_development=#  DELETE FROM users where id=(SELECT MAX(id) from users);
DELETE 1
Time: 2.549 ms
front_desk_development=# select MAX(ctid) from users;
  max
--------
 (5,9)
(1 row)

Time: 0.390 ms

Now that we deleted the row, let's see if postgres reuses the position (5,10) on disk.

front_desk_development=# insert into users ("email", "created_at", "updated_at") values ('luke@talesa.net', NOW(), NOW());
INSERT 0 1
Time: 2.641 ms
front_desk_development=# select MAX(ctid) from users;
  max
--------
 (5,12)
(1 row)

Time: 0.359 ms

Of course it doesn't, it skips to the next clean offset on the page.

This definitely incentivises avoiding unnecessary updates when working with rails. The following example will make 2 writes:

user.update_attribute(:email, "test@example.com")
user.update_attribute(:first_name, "Guy")

This example will only make a single write to the table.

# Note: You could also use update_attributes
user.email = "test@example.com"
user.first_name = "Guy"
user.save

The difference here is small, but imagine a workflow where we save after each attribute is changed or ActiveRecord workflows where callbacks are making various writes - that could result in 10s or 100s of written and then deleted rows in the database. Postgres reclaims the deleted rows using autovacuum, but avoiding unnecesarry writes means we'll get better performance from our database servers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment