You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
11:06 < vnnx> morning.. would you expect sequences to be "crash safe"? I mean, when using nextval() specifically.
11:06 < lluad_> vnnx, Yes. But your understanding of crash safe might not be the same as mine. What are you concerned about?
11:06 -!- lluad_ is now known as lluad
11:07 < vnnx> I'm working on something here, that can reproduce sequences being duplicated during unexpected `killall -9 postgres`. I am trying to reach the code path where this is happening
11:07 < vnnx> ... maybe to purpose a patch on hackers mailing list
11:11 < lluad> Interesting. If you've got an example that'll demonstrate what you're seeing someone who knows the guts of the implementation might have something useful to say.
11:14 < vnnx> yeah, it's not a simple "code path" I would say. right now I am not positive if the problem is indeeed on the sequence.c (nextval_internal) OR somewhere else
11:14 < RhodiumToad> so the thing about sequences is this
11:15 < vnnx> I do have a example
11:15 < vnnx> hi RhodiumToad
11:15 < RhodiumToad> the wal record for the advancement of the sequence, which happens only once per 32 nextval calls, is usually not immediately flushed to disk
11:15 < RhodiumToad> typically, since you just used the sequence to insert something, the commit of that insert will flush it to disk anyway
11:16 < RhodiumToad> so if you just do select nextval('blah'); in a transaction which you don't commit, and crash the db at that point, it's not impossible for it to duplicate the value
11:17 < vnnx> right.. the SEQ_LOG_VALS sets it to 32.
11:17 < RhodiumToad> but remember that those 32 values are logged _in advance_
11:17 < RhodiumToad> so typically what happens is that the sequence jumps _forwards_ by 1 to 31 values as a result of the crash
11:18 < vnnx> yeah, that's what is puzzling me. I mean, if we are loggin in advance why I still see dups?
11:19 < RhodiumToad> what exactly are you doing?
11:19 < vnnx> in summary: https://gist.github.com/vinnix/2fe148e3c42e11269bac5fcc5c78a8d1
11:20 < vnnx> right now I'm playing around with pg_waldump and diving into the code to undertand what I am seeing
11:22 < RhodiumToad> can't browse right now because I'm in the middle of updating stuff
11:22 < vnnx> I was thinking to come up with a patch and send it on pg-hackers, but I not sure ... probably I will start a thread there without an initial patch and see how we go from there.
11:23 < vnnx> no worries, but basically it's multiple SELECT nextval('bla'); as you said, saving it to a file.
11:23 < vnnx> there are two variations, one without auto-commit.. (and without commit as well)..
11:24 < vnnx> and other with auto-commit, but suspending the volume (`dmsetup suspend`) to simulate a storage problem
11:25 < vnnx> the "crash event" is just a 'killall -9 postgres’
11:26 < vnnx> to observe the duplication a save the results of nextval() into a text file.
11:26 < vnnx> (so yeah, no inserts here, only sequence's code path)
11:27 < lluad> If you're not seeing any issue with commited data I'd call that an implementation feature rather than a bug.
11:27 < vnnx> true
11:27 < vnnx> (a difference kind of insect) :)
11:28 < lluad> Any patch that changed that is likely to have noticable impact on performance, so I wouldn't bother with one unless it's both causing a real issue and you've discussed it on pg-hackers
first.
11:28 < vnnx> err.. different
11:30 < vnnx> right, makes sense.
11:31 < vnnx> I was thinking here, if a feature like that impacts the performance (and there might not be good to everyone, while others may care) that could be a sequence attribute [knob]
To fix all this, make nextval() acquire a top level xid, and only wait for
transaction commit if a transaction both acquired a xid and emitted WAL
records. If only a xid has been assigned we don't uselessly want to
wait just because of writes to temporary/unlogged tables; if only WAL
has been written we don't want to wait just because of HOT prunes.
The xid assignment in nextval() is unlikely to cause overhead in
real-world workloads. For one it only happens SEQ_LOG_VALS/32 values
anyway, for another only usage of nextval() without using the result in
an insert or similar is affected.