Skip to content

Instantly share code, notes, and snippets.

@niner
Created January 26, 2021 18:24
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save niner/7f7cb0a79db9ee933631ca47d44c2029 to your computer and use it in GitHub Desktop.
Save niner/7f7cb0a79db9ee933631ca47d44c2029 to your computer and use it in GitHub Desktop.
One of our Inline::Perl5 using Cro applications craps out seemingly randomly with errors like "DBIx::Class::Storage::DBI::Cursor::next(): DBI Exception: DBD::Pg::st fetch failed: no statement executing"
Since I know that Inline::Perl5 and DBIx::Class both do some crazy stunts to get memory management right, it stands to reason that it's some kind of reference count issue.
Turns out, it's actually not really an Inline::Perl5 bug at all. It's more an unfortunate side effect of the differences between Perl, Raku and DBIx::Class' features.
The "no statement executing" is thrown by DBD::Pg's dbd_st_fetch method when it detects that the statement handle is not actually active anymore, i.e. there's no query running and we have nothing to fetch. I simply looked for the error string and came up with https://metacpan.org/source/TURNSTEP/DBD-Pg-3.14.2/dbdimp.c#L3756
I ran Atikon-API in rr via:
PORT=10001 rr record rakudo service.pl6
and used Wolfgang's script to generate requests:
rakudo uebersicht_domain_stress.raku local
until the error appeared (after just a few requests).
With rr replay I got the recorded session in a debugger. I put a breakpoint at the place where the error gets generated:
(rr) break dbdimp.c:3756
(rr) continue
which got me to the place where the error occurs:
Thread 3 hit Breakpoint 1, pg_st_fetch (sth=0x7f95ff53c7e8, imp_sth=0x7f95fe9ad910) at dbdimp.c:3756
3756 pg_error(aTHX_ sth, PGRES_NONFATAL_ERROR, "no statement executing\n");
The failing check is: if ( !DBIc_ACTIVE(imp_sth) )
So I had a look at imp_sth:
(rr) p *imp_sth
$10 = {com = {std = {flags = 1613971, call_depth = 1, type = 3, my_h = 0x7f95ff5320c0, parent_h = 0x7f95ff5345f0, parent_com = 0x7f95fd8c68d0, thr_user = 0x7f96000ab5c0, imp_stash = 0x7f95fdaa34a0, imp_data = 0x7f96000ab710, kids = 0, active_kids = 0,
pid = 24933, dbistate = 0x7f95fc7e1e50}, attr = {TraceLevel = 0x7f95ff534b18, State = 0x7f95fb88b628, Err = 0x7f95ff53d008, Errstr = 0x7f95ff53c938, ErrCount = 0, LongReadLen = 80, FetchHashKeyName = 0x7f95ff5330f8}, num_params = 1, num_fields = 10,
fields_svav = 0x7f95ff533878, row_count = 12, fields_fdav = 0x0, spare1 = 0, spare2 = 0x0}, server_prepare = true, switch_prepared = 2, number_iterations = 12, placeholder_type = PLACEHOLDER_QUESTIONMARK, numsegs = 2, numphs = 1, numbound = 0,
cur_tuple = 12, rows = 12, async_flag = 0, async_status = 0, totalsize = 178, PQvals = 0x7f95fab363a0, PQlens = 0x0, PQfmts = 0x0, PQoids = 0x7f95fe0d1a70, prepare_name = 0x7f95ff56b6d0 "dbdpg_p24933_5", firstword = 0x7f95fe0d1a90 "SELECT",
result = 0x7f95fefe2650, type_info = 0x7f95fe9af330, seg = 0x7f95fe9aac10, ph = 0x7f95fe9e4780, prepare_now = false, prepared_by_us = true, direct = false, is_dml = true, has_binary = false, has_default = false, has_current = false, dollaronly = false,
nocolons = false, use_inout = false, all_bound = true}
Looking through DBI's source code, I found out that the Active flag is imp_sth->com.std.flags & 0x4, which indeed is off:
(rr) p imp_sth->com.std.flags & 0x000004
$13 = 0
So I set a watch point at that address to catch it when it gets changed and let the program run in reverse till the watchpoint triggers:
(rr) watch -l imp_sth->com.std.flags
Hardware watchpoint 5: -location imp_sth->com.std.flags
(rr) reverse-cont
Continuing.
Thread 3 hit Hardware watchpoint 5: -location imp_sth->com.std.flags
Old value = 1613971
New value = 1613975
0x00007f9618099fe0 in pg_st_finish (sth=0x7f95ff53c7e8, imp_sth=0x7f95fe9ad910) at dbdimp.c:3954
3954 DBIc_ACTIVE_off(imp_sth);
Got it. The difference between Old and New is precicely 4 (which is the Active flag). Don't be confused by Old and New here. We are running in reverse, so Old is what it got changed to.
The backtrace shows that we're in Inline::Perl5's p5_sv_destroy function:
(rr) bt
#0 0x00007f9618099fe0 in pg_st_finish (sth=0x7f95ff53c7e8, imp_sth=0x7f95fe9ad910) at dbdimp.c:3954
#1 0x00007f9618084f28 in XS_DBD__Pg__st_finish (my_perl=0x7f96000ab5c0, cv=<optimized out>) at ./Pg.xsi:757
#2 0x00007f961c1a6447 in XS_DBI_dispatch () from /usr/lib/perl5/vendor_perl/5.32.0/x86_64-linux-thread-multi/auto/DBI/DBI.so
#3 0x00007f961c94d5d7 in Perl_pp_entersub (my_perl=0x7f96000ab5c0) at pp_hot.c:5277
#4 0x00007f961c943c96 in Perl_runops_standard (my_perl=0x7f96000ab5c0) at run.c:41
#5 0x00007f961c8ab394 in Perl_call_sv (my_perl=my_perl@entry=0x7f96000ab5c0, sv=<optimized out>, flags=flags@entry=45) at perl.c:3091
#6 0x00007f961c9515d3 in S_curse (my_perl=my_perl@entry=0x7f96000ab5c0, sv=sv@entry=0x7f95ffc16868, check_refcnt=check_refcnt@entry=true) at sv.c:7031
#7 0x00007f961c951cc0 in Perl_sv_clear (my_perl=my_perl@entry=0x7f96000ab5c0, orig_sv=orig_sv@entry=0x7f95ff42d8a8) at sv.c:6628
#8 0x00007f961c95228e in Perl_sv_free2 (my_perl=0x7f96000ab5c0, sv=0x7f95ff42d8a8, rc=<optimized out>) at sv.c:7132
#9 0x00007f961cbc7b63 in Perl_SvREFCNT_dec (my_perl=0x7f96000ab5c0, sv=0x7f95ff42d8a8) at /usr/lib/perl5/5.32.0/x86_64-linux-thread-multi/CORE/inline.h:235
#10 0x00007f961cbc92a0 in p5_sv_destroy (my_perl=0x7f96000ab5c0, sv=0x7f95ff42d8a8) at p5helper.c:364
But where are we in the Raku code?
(rr) break MVM_frame_invoke
Breakpoint 6 at 0x7f9627d22ec0: file src/core/frame.c, line 406.
(rr) reverse-cont
Continuing.
Thread 3 hit Breakpoint 6, MVM_frame_invoke (tc=0x7f961432e390, static_frame=0x55a5078ca848, callsite=0x7f96282d5620 <two_args_callsite>, args=0x7f960352a008, outer=0x55a50cebcc00, code_ref=0x55a50cfaab40, spesh_cand=-1) at src/core/frame.c:406
406 if (static_frame->body.instrumentation_level != tc->instance->instrumentation_level) {
(rr) call MVM_dump_backtrace(tc)
at vendor#sources/D05B283A0B3853A171F242BF3BBECC84E0DA5A31 (Inline::Perl5::ClassHOW):172 (/usr/share/perl6/vendor/precomp/A35105D1BA7F020CD8303F9E55E37997A5C30CE4/D0/D05B283A0B3853A171F242BF3BBECC84E0DA5A31:)
from vendor#sources/D05B283A0B3853A171F242BF3BBECC84E0DA5A31 (Inline::Perl5::ClassHOW):172 (/usr/share/perl6/vendor/precomp/A35105D1BA7F020CD8303F9E55E37997A5C30CE4/D0/D05B283A0B3853A171F242BF3BBECC84E0DA5A31:DESTROY)
from vendor#sources/D05B283A0B3853A171F242BF3BBECC84E0DA5A31 (Inline::Perl5::ClassHOW):123 (/usr/share/perl6/vendor/precomp/A35105D1BA7F020CD8303F9E55E37997A5C30CE4/D0/D05B283A0B3853A171F242BF3BBECC84E0DA5A31:DESTROY)
from gen/moar/BOOTSTRAP/v6c.nqp:3975 (/usr/share/perl6/lib/Perl6/BOOTSTRAP/v6c.moarvm:)
from vendor#sources/2FBC7BEE9A6C3A25A5E5FAFBBB48EE7EBF2BF2D6 (Inline::Perl5::Array):28 (/usr/share/perl6/vendor/precomp/A35105D1BA7F020CD8303F9E55E37997A5C30CE4/2F/2FBC7BEE9A6C3A25A5E5FAFBBB48EE7EBF2BF2D6:Array)
from vendor#sources/2FBC7BEE9A6C3A25A5E5FAFBBB48EE7EBF2BF2D6 (Inline::Perl5::Array):40 (/usr/share/perl6/vendor/precomp/A35105D1BA7F020CD8303F9E55E37997A5C30CE4/2F/2FBC7BEE9A6C3A25A5E5FAFBBB48EE7EBF2BF2D6:iterator)
from SETTING::src/core.c/Hash.pm6:76 (/usr/share/perl6/runtime/CORE.c.setting.moarvm:STORE)
from SETTING::src/core.c/Any.pm6:80 (/usr/share/perl6/runtime/CORE.c.setting.moarvm:hash)
from /home/nine/work/Atikon-API/lib/API/v1/Model/Homepage.pm6 (API::v1::Model::Homepage):59 (/home/nine/work/Atikon-API/lib/.precomp/A35105D1BA7F020CD8303F9E55E37997A5C30CE4/46/46F010C9C012CDB23D396EBC0EA4E2B115300A3D:get-socialmedia-metadata)
(Inline::Perl5::ClassHOW):172 is a generic DESTROY method which calls p5_sv_destroy on the wrapped Perl 5 object.
So, what I see is a destructor of a Raku object representing an Atikon::DB::Timemngt::ResultSet::File running, which destroys the corresponding Perl resultset object which leads to a call to DBIx::Class::Storage::DBI::Cursor::__finish_sth which clears the Active flag of the statement handle.
This actually seems to work as inteded. The only problem is, that we're currently also accessing a different ResultSet::File and this gets affected, too somehow. The explanation for this is that DBIx::Class caches statement handles by default.
So, we're creating a ResultSet object, run a query, iterate through the results, all just fine.
We do this several times in a row and DBIx::Class will happily re-use the cached statement.
But then MoarVM's garbage collector kicks in and collects one of these ResultSet objects. This triggers cleanup (as is proper) which in turn forces the statement handle to finish, even though we still want to use it.
Knowing this, the fix is rather trivial:
diff --git a/lib/API/v1/Perl5/SchemaConnector.pm6 b/lib/API/v1/Perl5/SchemaConnector.pm6
index cf1d2fd..ced6fef 100644
--- a/lib/API/v1/Perl5/SchemaConnector.pm6
+++ b/lib/API/v1/Perl5/SchemaConnector.pm6
@@ -30,7 +30,12 @@ method schema-from-thread() {
my Str $db_pass = @!connect_info[2];
$thread.schema = $.p5-interpreter.invoke(
- 'Atikon::DB::Timemngt', 'connect', $db_dsn, $db_user, $db_pass
+ 'Atikon::DB::Timemngt',
+ 'connect',
+ $db_dsn,
+ $db_user,
+ $db_pass,
+ ${ disable_sth_caching => 1 }
);
}
}
DBIx::Class::Schema::Storage::DBI has an option to disable caching of statement handles. With this active the test script runs through thousands of requests successfully without a single failure.
Now why is this not a problem for pure Perl applications? In Perl, garbage collection is rather more predictable. An object is collected when the last reference to it is removed. This often happens when a variable goes out of scope. Usually this won't happen in the middle of unrelated operations. But it actually can be triggered manually:
perl -E '
use Atikon::DB::Timemngt;
my $schema = Atikon::DB::Timemngt->connect("dbi:Pg:dbname=timemngt");
my $rs1 = $schema->resultset("File")->search({product_id => 154343});
while ($rs1->next) { };
$rs2 = $schema->resultset("File")->search({product_id => 154343});
while ($rs2->next) {
undef $rs1;
}'
And bang:
DBIx::Class::Storage::DBI::Cursor::next(): DBI Exception: DBD::Pg::st fetch failed: no statement executing [for Statement "SELECT me.id, me.product_id, me.filepath, me.created_at, me.created_by, me.last_changed_at, me.last_changed_by, me.checksum, me.name, me.tag FROM files me WHERE ( product_id = ? )" with ParamValues: 1='154343'] at -e line 1
Same error. Curiously, this won't happen when you replace the while ($rs1->next) { } with a simple $rs1->all; This is because $rs1->all will call $rs1->reset when it's done. According to the docs, this also resets the cursor. This points us at the second possible fix:
diff --git a/lib/API/v1/Model/Homepage.pm6 b/lib/API/v1/Model/Homepage.pm6
index 362918e..28caad9 100644
--- a/lib/API/v1/Model/Homepage.pm6
+++ b/lib/API/v1/Model/Homepage.pm6
@@ -64,6 +64,8 @@ method get-socialmedia-metadata(Int $homepage_id!) {
@files.push: %data;
}
+ $rs.reset;
+
%result<files> = @files;
# Model Letter
And stability ensues. The first patch is the more secure one (you can't forget the reset call) while the second patch has better performance. Personally, I'd pick the more secure one.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment