Skip to content

Instantly share code, notes, and snippets.

@vinnix
Created April 29, 2020 10:02
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save vinnix/2fe148e3c42e11269bac5fcc5c78a8d1 to your computer and use it in GitHub Desktop.
Save vinnix/2fe148e3c42e11269bac5fcc5c78a8d1 to your computer and use it in GitHub Desktop.
Initial Report of PostgreSQL sequences being duplicated during crash

Multiple sessions + Suspending I/O + Autocommit;

## // Terminal 1: Setup
$ pg_ctl start

$ psql -c "DROP SEQUENCE IF EXISTS seqtest" -c "CREATE SEQUENCE seqtest"
$ rm -v seqtest.out

$ for i in {1..10} ; do echo $i; while true; do psql -XAxc "SELECT nextval('seqtest');" &>> seqtest.out; done & done; ##// Note there is no BEGIN;

## // Terminal 2: Tests

$ grep nextval seqtest.out | cut -d\| -f2 | sort | uniq -d | wc -l
0

$ sudo dmsetup suspend vg--01--xfstest-lv--xfstest && killall -9 postgres && sudo dmsetup resume vg--01--xfstest-lv--xfstest && pg_ctl start

$ grep nextval seqtest.out | cut -d\| -f2 | sort | uniq -d | wc -l
54

$ sudo dmsetup suspend vg--01--xfstest-lv--xfstest && killall -9 postgres && sudo dmsetup resume vg--01--xfstest-lv--xfstest && pg_ctl start

$ grep nextval seqtest.out | cut -d\| -f2 | sort | uniq -d | wc -l
120

$ sudo dmsetup suspend vg--01--xfstest-lv--xfstest && killall -9 postgres && sudo dmsetup resume vg--01--xfstest-lv--xfstest && pg_ctl start

$ grep nextval seqtest.out | cut -d\| -f2 | sort | uniq -d | wc -l
145

## // Terminal 3: Finishing

$ for i in {1..10}; do kill %$i; done;

Single session + Not Suspending I/O + Explicitly opening transaction;

$ pg_ctl start

$ psql -c "DROP SEQUENCE IF EXISTS seqtest" -c "CREATE SEQUENCE seqtest"
$ rm -v seqtest.out

$ while true; do psql -XAxc "BEGIN; SELECT nextval('seqtest');" &>> seqtest.out; done &

$ grep nextval seqtest.out | cut -d\| -f2 | sort | uniq -d | wc -l
0

$ killall -9 postgres
$ pg_ctl start ##// or you can let a loop running with:
               ##// while true; do killall -9 postgres; pg_ctl start; sleep 55; done;

$ grep nextval seqtest.out | cut -d\| -f2 | sort | uniq -d | wc -l
12

$ grep nextval seqtest.out | cut -d\| -f2 | sort | uniq -d | wc -l
20

$ grep nextval seqtest.out | cut -d\| -f2 | sort | uniq -d | wc -l
32

$ grep nextval seqtest.out | cut -d\| -f2 | sort | uniq -d | wc -l
59

$ grep nextval seqtest.out | cut -d\| -f2 | sort | uniq -d | wc -l
69

(...)

$ grep nextval seqtest.out | cut -d\| -f2 | sort | uniq -d | wc -l
112
@vinnix
Copy link
Author

vinnix commented Apr 29, 2020

11:06 < vnnx> morning.. would you expect sequences to be "crash safe"? I mean, when using nextval() specifically.
11:06 < lluad_> vnnx, Yes. But your understanding of crash safe might not be the same as mine. What are you concerned about?
11:06 -!- lluad_ is now known as lluad
11:07 < vnnx> I'm working on something here, that can reproduce sequences being duplicated during unexpected `killall -9 postgres`. I am trying to reach the code path where this is happening
11:07 < vnnx> ... maybe to purpose a patch on hackers mailing list
11:11 < lluad> Interesting. If you've got an example that'll demonstrate what you're seeing someone who knows the guts of the implementation might have something useful to say.
11:14 < vnnx> yeah, it's not a simple "code path" I would say. right now I am not positive if the problem is indeeed on the sequence.c (nextval_internal) OR somewhere else
11:14 < RhodiumToad> so the thing about sequences is this
11:15 < vnnx> I do have a example
11:15 < vnnx> hi RhodiumToad
11:15 < RhodiumToad> the wal record for the advancement of the sequence, which happens only once per 32 nextval calls, is usually not immediately flushed to disk
11:15 < RhodiumToad> typically, since you just used the sequence to insert something, the commit of that insert will flush it to disk anyway
11:16 < RhodiumToad> so if you just do select nextval('blah'); in a transaction which you don't commit, and crash the db at that point, it's not impossible for it to duplicate the value
11:17 < vnnx> right.. the SEQ_LOG_VALS sets it to 32.
11:17 < RhodiumToad> but remember that those 32 values are logged _in advance_
11:17 < RhodiumToad> so typically what happens is that the sequence jumps _forwards_ by 1 to 31 values as a result of the crash
11:18 < vnnx> yeah, that's what is puzzling me. I mean, if we are loggin in advance why I still see dups?
11:19 < RhodiumToad> what exactly are you doing?
11:19 < vnnx> in summary: https://gist.github.com/vinnix/2fe148e3c42e11269bac5fcc5c78a8d1
11:20 < vnnx> right now I'm playing around with pg_waldump and diving into the code to undertand what I am seeing
11:22 < RhodiumToad> can't browse right now because I'm in the middle of updating stuff
11:22 < vnnx> I was thinking to come up with a patch and send it on pg-hackers, but I not sure ... probably I will start a thread there without an initial patch and see how we go from there.
11:23 < vnnx> no worries, but basically it's multiple SELECT nextval('bla'); as you said, saving it to a file.
11:23 < vnnx> there are two variations, one without auto-commit.. (and without commit as well)..
11:24 < vnnx> and other with auto-commit, but suspending the volume (`dmsetup suspend`) to simulate a storage problem
11:25 < vnnx> the "crash event" is just a 'killall -9 postgres’
11:26 < vnnx> to observe the duplication a save the results of nextval() into a text file.
11:26 < vnnx> (so yeah, no inserts here, only sequence's code path)
11:27 < lluad> If you're not seeing any issue with commited data I'd call that an implementation feature rather than a bug.
11:27 < vnnx> true
11:27 < vnnx> (a difference kind of insect) :)
11:28 < lluad> Any patch that changed that is likely to have noticable impact on performance, so I wouldn't bother with one unless it's both causing a real issue and you've discussed it on pg-hackers
first.
11:28 < vnnx> err.. different
11:30 < vnnx> right, makes sense.
11:31 < vnnx> I was thinking here, if a feature like that impacts the performance (and there might not be good to everyone, while others may care) that could be a sequence attribute [knob]

@vinnix
Copy link
Author

vinnix commented Apr 29, 2020

@vinnix
Copy link
Author

vinnix commented Apr 29, 2020

https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=fd6a3f3ad4067f1b8fc28e9de6e99e5936d82161

To fix all this, make nextval() acquire a top level xid, and only wait for
transaction commit if a transaction both acquired a xid and emitted WAL
records.  If only a xid has been assigned we don't uselessly want to
wait just because of writes to temporary/unlogged tables; if only WAL
has been written we don't want to wait just because of HOT prunes.

The xid assignment in nextval() is unlikely to cause overhead in
real-world workloads. For one it only happens SEQ_LOG_VALS/32 values
anyway, for another only usage of nextval() without using the result in
an insert or similar is affected.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment