Skip to content

Instantly share code, notes, and snippets.

@ribasushi
Last active January 25, 2020 09:12
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 2 You must be signed in to fork a gist
  • Save ribasushi/032fbb8b58e2b0d62fae to your computer and use it in GitHub Desktop.
Save ribasushi/032fbb8b58e2b0d62fae to your computer and use it in GitHub Desktop.
[11:16:52] <gbjk> ilmari: DBix-Class + IO::Async ?
[11:18:30] * gbjk ponders DBIx::Async and Net::Async::Postgresql
[11:19:42] <tom_m> NaPostgreSQL performance is much better than DBIx::Async, since it's all in-process, API needs updating though
[11:20:16] <gbjk> tom_m: Most of what we do sits on top of DBix::Class.
[11:20:39] <gbjk> We'd be wanting to somehow make that work.
[11:21:47] <tom_m> yeah, I figure that's a pretty common situation. I think the choices there are "hope that it doesn't block for too long" or "move all the db access to a separate process"
[11:22:49] <gbjk> tom_m: We've got a c-tree isam backed db.
[11:23:00] <gbjk> tom_m: Some good queries take 15 seconds.
[11:23:05] <gbjk> tom_m: Some bad ones take a lot longer.
[11:23:12] <gbjk> tom_m: Async would make the most sense.
[11:23:36] <tom_m> sadly DBIC isn't really designed for async, not sure if there's been work to change that recently (I don't use it myself).
[11:24:01] <gbjk> Yeah. All of everything would need to be changed to futures or similar.
[12:08:06] <ribasushi> gbjk: the main issue isn't the code, but the API behavior. Everyone asks for "async" as if it is some sort of widely agreed upon set of traits, without any hints to what this would mean for
a DBIC user *API-wise*
[12:08:58] <ribasushi> it doesn't matter how DBIC is designed currently, the real question is - are the APIs amenable to some sort of useful async (or are there new apis that can be put together that will somehow
work in existing *user* codebases)
[12:09:47] <ribasushi> sometimes people as me for search()es to be async, and start executing immediately. Then I ask about transactions, to which the answer invariably is "can't we worry about this later?"
[12:09:58] <tom_m> I think it would be very hard to get something that works for existing code, unless you take the Coro route (or similar)
[12:10:02] <ribasushi> and conversation stops, because internally we use transactions in a lot of places
[12:11:05] <ribasushi> tom_m: which would be fine, I am more trying to get heavy dbic and async users to present what they would like to see
[12:11:20] <gbjk> ribasushi: I don't yet grok the transaction problem.
[12:11:32] <ribasushi> instead of the conversation going on how it would be made to work (which is much much easier than defining how things will behave on paper)
[12:11:40] <tom_m> transactions are pretty simple - it's just a queue of tasks, much like Future::Utils::fmap
[12:11:48] <gbjk> ribasushi: the *user* doing transactions is one thing, dbic doing them itself.
[12:12:14] <gbjk> I'll come back to this much later.
[12:12:49] <ribasushi> gbjk: to rephrase - the user requesting an atomic-like operation may very well open a txn within dbic to be able to guarantee the atomicity
[12:13:22] <ribasushi> populate, delete|update_all (or complex rs delete|update), x_or_create, multicreate
[12:13:56] <ribasushi> and note - throwing exceptions on "eeeep can't async while in a txn" is fine, but it needs to be defined upfront
[12:14:02] <ribasushi> instead of being an afterthought
[12:14:08] <tom_m> no exceptions, just Future->fail
[12:14:50] <tom_m> and it's not a failure to request something while in a transaction, it's just "queue this task".
[12:14:50] <ribasushi> anyway - that's my take on async DBIC - nobody has come forward to what they would want to actually see as a finished-ish API and set of behaviors
[12:15:08] <tom_m> ("in a transaction" isn't something I tend to deal with - everything is in a transaction, just some transactions only have a single task)
[12:15:37] <ribasushi> tom_m: from an entirely async application pov - yes this is true
[12:16:02] <ribasushi> but as far as I can tell people actually are asking for a hybrid
[12:18:34] <tom_m> separate objects for sync/async? _sync methods that call the async ones with ->get on the end?
[12:18:37] <tom_m> if the issue is that no one seems clear on how the async parts should work, why not do the async parts, then at least you have the sync+async APIs and it's easier to see where they'd fit together?
[12:18:41] <ribasushi> gbjk: ^^ if you can digest all of the above into a "ok assume everything is possible, this is what I want", it'll be quite useful
[12:20:25] <ribasushi> tom_m: last sentence didn't make sense
[12:20:34] <ribasushi> no one seems clear on how the async parts should work, why not do the async parts <---
[12:21:10] <tom_m> Design+write "just async", rather than making the problem harder by trying to do hybrid as well in the same step
[12:21:52] <tom_m> since it sounds like the part you're not sure about is the hybrid bit, rather than async itself?
[12:23:08] <gbjk> ribasushi: Just read the backlog.
[12:23:15] <gbjk> I don't get why people would want hybrid.
[12:23:17] <ribasushi> "just async" is difficult due to "well just rewriting dbic for the sake of it makes no sens... what parts of DBIC are valuable to you anyway...? why not use tangence in the first place?"
[12:23:31] <ribasushi> gbjk: ^^ in case you don't want hybrid this last thing applies
[12:23:44] <gbjk> ribasushi: Exactly.
[12:23:54] <gbjk> ribasushi: I wasn't expecting it to be a *small* thing.
[12:24:25] <tom_m> because tangence doesn't provide features like query consolidation as you'd get with a "real" async ORM?
[12:24:51] <gbjk> As I feel it right now, it'd be a massive rewrite to orchastrate everything in terms of futures.
[12:24:55] <tom_m> and yes, I agree it's not a "small" thing, just seems to come up often enough that people want "async DBIC", not "some other async ORM" or "DBIC in a subprocess"
[12:25:08] <ribasushi> gbjk: why would one look at a rewrite at all is my question
[12:25:16] <gbjk> ribasushi: I've been woring a lot on Net::Async::CassandraCQL recently, and luckily not needed dbixc on top of it.
[12:25:36] <gbjk> ribasushi: Because dbs can block, and not blocking is nice?
[12:26:16] <ribasushi> gbjk: async for the sake of async would be the death of the project (just like Catalyst with Moose for the sake of Moose)
[12:26:34] <gbjk> ribasushi: I don't think those are the same.
[12:26:41] <ribasushi> because it will add extra requirements and limitations for a vast majority of people not being able to take advantage of them
[12:27:18] <ribasushi> the other side of this question is - what does DBIC give you that you can'g get elsewhere: it would be much more useful to isolate the parts that are "universally cool" and build a different project with moderate code reuse
[12:27:24] <tom_m> I don't think anyone's suggesting *replacing* the sync API?
[12:27:26] <gbjk> ribasushi: cat for moose is driven by "we want moose". Async is functional.
[12:27:33] <tom_m> async would need to be separate to the sync API but using as much of the query handling and underlying models as possible,
[12:28:14] <gbjk> tom_m: I think I'm saying that any sync api would have to be async underneath, and then just pull it immediately.
[12:28:15] <tom_m> there are things you can do with async that you can't with sync, and there are codebases that require the current sync API. I don't think it's possible (or wise) to bridge that gap with a single DBIC API
[12:28:48] <gbjk> tom_m: I agree two apis, but, as you said earlier, it should just ->get
[12:28:54] <tom_m> yeah - having sync as a layer on async seems feasible.
[12:29:14] <ribasushi> let me make sure I get this right
[12:30:03] <gbjk> ribasushi: I'm don't *THINK* I'm about to sponsor this work, so I *think* the hypothetical conversation can wait until it's about to be implemented. Like you say, this happens every time.
[12:30:11] <ribasushi> Is your proposal to take a mature project, and rearchitecture it from the ground up to be an async application with a sync emulator on top, and have *every* current user take the complexity/performance penalty for the enqueueing and dequeueing of things?
[12:30:44] <gbjk> ribasushi: Not to sound flippant, but have you worked with futures much?
[12:31:06] <gbjk> ribasushi: I don't feel like this would be an "enqueueing and dequeueing" thing, or that there'd be performance.
[12:31:13] <tom_m> my proposal is "make an async-capable fork of DBIC, or at least a spec for one. see if it's useful, then think about the next steps"
[12:31:14] <gbjk> I'm not *even* thinking it'd be complex.
[12:31:28] <ribasushi> gbjk: yes, wrote an entire self contained Tickit-based curses app ;)
[12:32:13] <gbjk> ribasushi: I'll probably skunk look at this in a few months. Till then, thanks for indulging me with a discussion about it.
[12:34:34] <ribasushi> to rephrase - I am not at all against a fork (newly namespaced whatever) DBIC-like async thing
[12:34:45] <ribasushi> in fact I will be trying my best to share code betweeen the two
[12:35:23] <ribasushi> making DBIx::Class (the exact namespace) hard-require an event loop, even for the traditional sync operation I think is suicide
[12:35:29] <ribasushi> gbjk: ^^
[12:37:41] <tom_m> A Future-based API does not impose an event loop requirement. There is likely to be a performance cost, but why not get the code in place and measure that first?
[12:38:47] <tom_m> anyway, just seems like it comes up often enough to be a useful project. I don't use DBIC myself but I would like to see an async version of it, even if it has a separate existence.
[12:40:05] <ribasushi> tom_m: point wrt the event loop, I just don't think of features without a loop, as then they are nothing more than overhead (massive overhead at that)
[12:40:31] <ribasushi> also my experience comes from places where the perl side is the main overhead (i.e. the RDBMS is virtually non-blocking from the code pov)
[12:41:22] <gbjk> ribasushi: Yeah, what tom_m said.
[12:41:45] <gbjk> ribasushi: You see futures as massive overhead? I've quantified this, and haven't found it to be true.
[12:41:50] <gbjk> ribasushi: I think I saw 4%
[12:42:04] <tom_m> it might be good motivation for a Future::XS implementation.
[12:43:39] <ribasushi> gbjk: perhaps we are talking about a different approach then...
[12:44:12] <ribasushi> I think mst weighing in at this point would be helpful, as we seem to be talking past each other
[12:46:17] <tom_m> a wider discussion would likely be beneficial, sure - it'd be good to get as much input from potential users as possible.
[12:46:19] <tom_m> and if it turns out there's only 4 or 5 people who'd ever use this, then maybe the amount of work required isn't justified =)
[12:47:02] <ribasushi> actually I am not afraid of work
[12:47:51] <ribasushi> I am much more concerned by the combinatorial explosion of complexity for all the downstreams (and the dbic userbase is incredibly large)
[12:48:04] <ribasushi> especially if that complexity benefits 4 or 5 people ;)
[13:02:24] <gbjk> tom_m: Future::XS.
[13:02:27] <gbjk> tom_m: Wow.
[13:02:39] <gbjk> tom_m: That jumped the shark and the pier.
[13:05:33] <tom_m> not keen on the idea?
[13:26:12] <gbjk> tom_m: Keen on the idea. Also want a unicorn butterfly kitten.
[13:26:28] <gbjk> OH. I forgot rainbow.
[13:26:41] <gbjk> https://s-media-cache-ak0.pinimg.com/736x/bf/3f/4c/bf3f4c4e4cbc909f957f939bb6bc7cc6.jpg
[13:27:46] * LeoNerd skims scrollback
[13:28:23] <LeoNerd> ribasushi: I think if you wanted async DBI with transactions, you'd have something like my $txn = $dbh->begin->get; $txn->query( "...." )->get; .... $txn->commit->get;
[13:28:24] <tom_m> the XS implementation should be easier than splicing DNA with photons, although I'd prefer a C Future library with XS bindings rather than just a port.
[13:28:42] <LeoNerd> I.e. all of the actual data operations and queries would be done via the transaction object; the DB handle itself would then simply be a factory of transactions
[13:28:51] <LeoNerd> This is how Pythons' ADBAPI works anyway
[13:29:25] <tom_m> supporting result lists in (non-perl) C code would make that a tricky prospect, I guess.
[13:30:53] <tom_m> $dbh->txn(sub { ... ->isa(Future) })->get => commit/rollback/release
[13:31:15] <LeoNerd> Yeah, that's then the obvious wrapper for it
[13:31:36] <LeoNerd> You could even do something like $dbh->configure( max_txns_concurrently => 5 )
[13:31:48] <LeoNerd> To suggest that $dbh->txn( ,.. ) won't even start the block yet, if there's a queue
[13:32:30] <tom_m> unless you're handling multiple connections at once I wouldn't try running any ->txn code until the begin tran; succeeds
[13:32:41] <LeoNerd> Hopefully you'd at least pick a number higher than 1 otherwise you'd lose most of the benefit :)
[13:33:32] <tom_m> I don't think there's a benefit to be had on most of the common databases, you don't get multiple active transactions on a single database connection?
[13:33:52] <tom_m> (also ->txn gets an object that looks like a handle, of course, for nested ->txn)
[13:34:02] <gbjk> Hah! So, our use-case is c-tree isam, which we talk to via odbc.
[13:34:07] <gbjk> Just looked at DBD::ODBC.
[13:34:11] <gbjk> This road would not be easy.
[13:34:55] <gbjk> DBD::ODBC::FFI is probably needed.
[13:35:27] <gbjk> But that wouldn't fix the fact it couldn't async.
[13:35:35] <tom_m> what's the underlying protocol - in-process, or unix socket/tcp/something asyncable?
[13:36:20] <gbjk> TCP sockets.
[13:36:34] <tom_m> might as well just implement that in perl, then
[13:36:59] <tom_m> sqlite tends to be the odd one out, that pretty much needs a separate process/thread (or some nasty hacking around in the core to invert the idle callback thing)
[13:38:31] <gbjk> But ODBC.xs exposes dbdimp, which isn't looking easy.
[15:21:13] <frew> /1/w ribasushi
[15:21:15] <frew> woops
[15:58:45] <frew> ribasushi: so for me, major goals for async DBIC would include (off the top of my head)
[15:59:13] <frew> ribasushi: non-blocking, so I can do a create, update, delete, w/e, and while it's talking to the db do other stuff
[16:00:09] <frew> ribasushi: a less important, though nice, thing would be notifications
[16:00:22] <LeoNerd> Well tha'ts going to depend a lot on the DB engine
[16:00:34] <frew> ribasushi: some protocols allow you to subscribe to like, changes in a table, ODBC does, but DBD::ODBC requires it to be the while loop
[16:00:47] <LeoNerd> Simply "not blocking" is easy enough to do.. the principle is just do what DBIx::Async currently does... some implementations might be able to do it more efficiently than that, though
[16:00:51] <frew> LeoNerd: I know, I'm just listing all the things I can thnk of
[16:01:02] <tom_m> worth designing the concept (notifications) into the system from the start rather than trying to fit it in afterwards
[16:01:10] <LeoNerd> Yah
[16:01:20] <frew> yeah and notifications is a much more complex thing than non-blocking
[16:01:22] <tom_m> or at least putting it in the spec and dropping it later if it's too hard =)
[16:01:33] <frew> gbjk for example thinks that everything will fit in futures, and notifications never would
[16:02:06] <tom_m> would it help if I put a DBIx::Async::ORM implementation somewhere? I have three of them around, if I just pick the simplest it might at least serve as a base for playing around with concepts.
[16:02:28] <frew> ribasushi: I think the final thing would be for transactions to work
[16:02:48] <LeoNerd> tom_m: personally I'd like to start with a halfway-house that isn't an ORM
[16:02:48] <tom_m> Futures are for tasks, notifications are events or callbacks, and I think the separation is quite well-defined so that part shouldn't be too problematic
[16:02:52] <frew> ribasushi: the ORM would have to deeply control the $dbh for that to work, but it could be done I'm pretty sure
[16:03:10] <LeoNerd> Simply an async future-ish wrapper that takes/returns hashrefs of column names, doing as little possible magic
[16:03:13] <frew> tom_m: yeah I'm just saying gbjk sorta said to ribasushi that with Futures everything is easy
[16:03:36] <LeoNerd> $dbh->insert( "tablename", { column => 1, names => 2, here => "please" } )->get
[16:03:43] <frew> ribasushi: if you want more details on the txn thing let me know; I vaguely can see it in my mind, if only because Go does that.
[16:03:53] <tom_m> LeoNerd: like DBIx::Simple, you mean?
[16:04:02] <LeoNerd> I know that's not *really* using the R part of an RDBMS, but that's honestly all I do with them most of the time
[16:04:12] <frew> LeoNerd: yeah we want more ;)
[16:04:39] <LeoNerd> OHsure, but my simple requirements above can be hacked up in a couple of hours :) And just be over and done with
[16:04:49] <tom_m> and transactions are one of the first things to implement, queries just get a single-use transaction instance if they're not already part of one.
[16:04:53] <LeoNerd> And then I can actually *use* it to get some ideas and play around witn better things
[16:04:58] <LeoNerd> Yah
[16:05:35] <ribasushi> frew: let's go bak to the first thing
[16:05:40] <frew> k
[16:05:42] <ribasushi> create/update/delete
[16:05:44] <tom_m> suspect it'd take me too long to wade through DBIC to pick out enough things to get a usable toy, but like I say - transactions, ORM-style stuff, that I have already.
[16:05:48] <ribasushi> i.e. write operations
[16:06:14] <frew> ok (though I don't see why select shouldn't be included, I just didn't think to say it)
[16:06:45] <ribasushi> do reads that come after these writes get automatically queued in? or do they get a new handle and return "repeateble read"-style stale data?
[16:07:00] <gbjk> frew: 15:01 - I don't know, actually. Each notification could be the ready on a future, etc.
[16:07:14] <frew> gbjk: how can that make any sense?
[16:07:42] <frew> ribasushi: I don't think anything automatic can work
[16:07:48] <ribasushi> frew: I am not asking "which way can be made to work" - they all can be
[16:07:52] <gbjk> frew: I'm thinking that your notification api could be a promise to deliver a notification future.
[16:07:55] <ribasushi> I am more asking - what would you expect as a auser
[16:08:08] <ribasushi> especially if you have a lot of dbic experience to draw upon
[16:08:22] <frew> gbjk: and how does it give two notifications at once?
[16:08:23] <gbjk> my $table_changes = $rs->notifications; $table_changes->on_ready(...)
[16:08:26] <gbjk> This smells, I think.
[16:08:39] <gbjk> frew: Seriously? That's your question!
[16:08:49] <frew> don't worry about it
[16:08:56] <frew> I'm just saying that futures don't answer everything
[16:08:57] <gbjk> frew: "at once" is the strangest phrase a dev ever says.
[16:09:01] <tom_m> requests in DBIC are effectively serialised at the moment - next one starts after the existing one completes. why not keep that pattern? it's nice and simple.
[16:09:17] <frew> ribasushi: well, as a DBIC + IO::Async user, whatI'd like would be
[16:09:35] <LeoNerd> I like the idea of doing everything via a txn, really. That's what ADBAPI does :)
[16:09:38] <LeoNerd> </brokenrecord>
[16:09:57] <gbjk> I think this is a lot more analogous with IO::ASync::Notifier, but I was trying to keep just futures invading the api, rather than tying to an event system.
[16:10:18] <frew> ribasushi: $row->update(...)->then(sub { $rs->all })->then(sub { say $_->name for @_ })
[16:10:57] <gbjk> LeoNerd: Is there a pattern for a future factory built purely from futures for an event queue? Future returns ready each time, and ->get returns a new future you can use next, and the notification
[16:11:24] <LeoNerd> Not really.. that's more a streamy-sourcey-thing...
[16:11:31] <gbjk> Yeah.
[16:11:36] <ribasushi> LeoNerd: txns are not as practical as one might think (especially on big-corp engines)
[16:11:58] <gbjk> The aim is to let the consumer wire the futures into their event loop however they want.
[16:12:01] <ribasushi> so speccing around "everything is a txn" with no way out won't fly outside of a limited set of DBDs
[16:12:14] <frew> gbjk: why do you think that's better than letting the consumer wire in a callback?
[16:12:27] <LeoNerd> ribasushi: I didn't necessarily mean at the "talking to the DB" side
[16:12:29] <frew> gbjk: callbacks are much simpler in this case and are in fact simpler
[16:12:30] <LeoNerd> I meant at the API side
[16:12:30] <tom_m> if you start with on_item => sub { ... }, optionally returning a Future if you want to serialise / semaphore on notification handlers, that should be easy to tie into Ev::Dist or whatever system is around
[16:12:48] <gbjk> $future->on_done(sub { my ($future, $notification) = @_; $future->on_done(this_sub); $loop->adopt_future($future) )
[16:13:07] <gbjk> frew: I might have gotten blinkered and callbacks are just plain damn simpler.
[16:13:10] <ribasushi> LeoNerd: I am not sure what you mean then
[16:13:32] <LeoNerd> ribasushi: E.g. $dbh->do_txn( sub { my $txn = shift; $txn->select(....)->then( sub { $txn->insert( ... something based on select result ) } ) -> get;
[16:13:37] <gbjk> frew's right. if notifications are callbacks, who needs futures in there.
[16:13:41] <LeoNerd> Might not say anything about *transactions* over the wire, to the DB engine
[16:13:56] <gbjk> OTOH, the point was "not everything is as simple as Futures", but the example is significantly *simpler*
[16:13:59] <LeoNerd> It might simply serialise them internally within itself, not starting a new one until the currntly-runnign one has finished.
[16:14:11] <gbjk> OTOH, it lacks an event loop
[16:14:13] <tom_m> the main thing a Future gives you in notifications is the ability to synchronise between several handlers for the same notification.
[16:14:32] <tom_m> but that's covered by returning a Future from the callback, rather than providing one to start with
[16:15:01] <LeoNerd> ribasushi: A "transaction" being "a context that can be entered and left, and is the only thing you can use to perform actual data queries on." The DBH then becomes just a factory of transactions
[16:15:46] <tom_m> LeoNerd: note that $dbh and $txn can present the same API. calling methods on $dbh thus acts like the API-type transaction, without starting a real database transaction
[16:15:52] <tom_m> at least, that's how I do it
[16:15:55] <LeoNerd> Yah
[16:16:04] <LeoNerd> Ohyes.... shortcuts for one-shots on the DBH
[16:16:06] <ribasushi> LeoNerd: so what you call "transaction" is actually "serialized atomic instructions to the RDBMS" ?
[16:16:21] <LeoNerd> Yes.
[16:16:34] <frew> (hopefully prefixed with TXN BEGIN and then suffixed with TXN COMMIT?)
[16:16:37] <ribasushi> LeoNerd: how does this at all fit with actual rdbms-side transactions?
[16:16:38] <LeoNerd> It _might_ be backed by a DB-aware BEGIN/COMMIT cycle if the DBIxA:D decided it should
[16:16:56] <LeoNerd> Who knows? Maybe it's configurable?
[16:17:11] <frew> ribasushi: what did you mean about txn's not always being an option?
[16:17:17] <frew> ribasushi: are they too pricey on oracle or something?
[16:17:50] <LeoNerd> If $txn objects are not backed by RDBMS transactions, then there ought only be one outstanding at once. If they are, then it could keep multiple of them up in the air at once
[16:18:14] <LeoNerd> Ofcourse, you could even set it in a mode saying that you want automatic restart from the beginning (as it has the CODE ref, it can do that)...
[16:18:31] <frew> multiple in the air at once sounds like a handy way to create a deadlock in a single process ;)
[16:18:31] <LeoNerd> That starts to play into STM-style ideas.. what if the other code inside the txn has other side-effects
[16:18:33] <ribasushi> frew: on Sybase it's a new-connection-per-transaction, on Mssql with not-so-great drivers there are various strange looks taking place on Oracle it's relatively ok actually ;)
[16:18:56] <frew> ribasushi: oh you mean like the MARS stuff or w/e
[16:18:59] <frew> ribasushi: gotcha
[16:19:00] <ribasushi> yes
[16:19:09] <frew> (Which even on windows we don't use.)
[16:19:12] <frew> (fwiw)
[16:19:13] <LeoNerd> frew: With a little bit of magic, it sounds like a handy way to create *debuggable* deadlocks that can introspect and say "Oh look you dummy, you made a deadlock by doing this, so I'm going to die now"
[16:19:29] <ribasushi> LeoNerd: in reality almost all code that talks to an rdbms has side effects outside of the rdbms
[16:19:36] <frew> LeoNerd: yeah I'm not saying not to suport it, just that I know it would happen
[16:19:57] <mst> ribasushi: not sure I see how half a dozen extra sub calls per db roundtrip would make *that* much difference, bear in mind by 'roundtrip' I mean 'complete query'
[16:19:59] <LeoNerd> ribasushi: Righty.. so make it the default option not ot do that. But allow it by request, because I could imagine wanting to do that
[16:20:01] <tom_m> http://paste.scsys.co.uk/463416
[16:20:23] <LeoNerd> ribasushi: if I *know* my entire txn body code is side-effect-free, or at least, idempotent, then I'm quite happy for failed commits to retry from the beginning
[16:20:25] <mst> ribasushi: if you wrapped fetching each row in a future, suddenly the overhead would get pretty heinous, sure
[16:20:53] <ribasushi> mst: if all we are doing is wrapping subs - sure, but then (as frew noted) I don't see what does this buy anyone
[16:21:22] <tom_m> LeoNerd: just let Future::Utils::* handle the retries and other logic, no need to push that into the DBIx::whatever layer?
[16:21:25] <mst> ribasushi: I'm sorry? I didn't say 'wrapping subs', and I've no idea what that means
[16:21:34] <mst> please don't conflate random things other people said with what I'm saying
[16:21:43] <LeoNerd> tom_m: mm? Well, sure..
[16:21:57] <ribasushi> sorry, when I said subs I meant futures (since they are just that in my head)
[16:22:39] <mst> ribasushi: ok, so, the only point in DBIC that I can see wanting to asyncify is basically $rs->cursor
[16:22:49] <LeoNerd> Hrm? That's possibly where you're going wrong then - they're exact opposites. :) A sub is something you -pull- results from when you want it to run, by invoking it. A Future is something that -pushes- results to you when it has finished running, all of its own accord.
[16:23:08] <frew> gbjk: ^^
[16:23:38] <mst> ribasushi: did you look at any of tom_m's implementations yet?
[16:23:55] <mst> tom_m: did you get round to linking the simple one yet?
[16:24:25] <LeoNerd> I'm beginning to feel at this stage it's becoming "Code or STFU" - maybe take a moment to hack up a quick concrete example. Often easier to discuss concretes you can point atthan abstracts
[16:24:46] <tom_m> mst: not yet, unless you mean DBIx::Async
[16:24:49] <ribasushi> I have vaguely looked through tangence, and I can't find anything to hang on to, since the focus is on treating the rdbms as a data store, with the txn machinery being more or less ignored
[16:25:04] <tom_m> tangence is very different from this, I don't think it's particularly relevant?
[16:25:06] <LeoNerd> Tangence is nothing to do with DBI
[16:25:18] <LeoNerd> Tangence is more like... CORBA, or SOAP, or any of that crowd
[16:25:24] <ribasushi> mst: also "wrapping ->cursor" is not something that neither gbjk nor frew entioned as primary pain point
[16:25:40] <tom_m> and I agree with LeoNerd, real code would seem to be useful at this point.
[16:25:58] <ribasushi> I also agree about real code being quite useful
[16:26:04] <LeoNerd> Since I already have a little application doing SQLite things in a mostly async/future-ish way at home, using Ev::Dist, maybe that's a place I'll start from
[16:26:14] <ribasushi> I am just wary of the push to prototype it within the guts of dbic ;)
[16:26:22] <LeoNerd> It -currently- does sync. SQLite because it doesn't really matter, but I can rewrite it for an example
[16:26:46] <frew> LeoNerd: the hard thing, at least to me, is how does one make this work with actual remote db's
[16:26:52] <LeoNerd> .oO( Though I might play with Prometheous instead )
[16:26:55] <LeoNerd> (SP)
[16:27:03] <frew> LeoNerd: since they all have their own protocol or w/e
[16:27:09] <LeoNerd> frew: Ah.. well. Mmmm... unless you have some particular driver that supports it,..
[16:27:12] <LeoNerd> Yes; tha'ts the thing
[16:27:13] <frew> right
[16:27:14] <mst> ribasushi: by and large, when people say "I want DBIC to do async" it seems to me they're saying "I want my program to continue doing things between the point where I send a SELECT to the db and the point where it has results for me to process"
[16:27:42] <LeoNerd> frew: The current implementation of DBIx::Async just makes a sidecar process to hold a real DBI + DBD::foo pair inside. It's simple and JustWorks for any DB engine, but often isn't the most efficient
[16:27:48] <frew> I'd really like notification too, but I've gotten around that by not using the database anymore
[16:27:55] <tom_m> frew: once it works with an out-of-band process via DBI, we just write the protocol driver? they're mostly quite simple
[16:27:58] <frew> LeoNerd: right
[16:28:13] <LeoNerd> frew: My plan is verymuch to write a spec/interface of DBIx::Async::DBD::* and some implementations of it for those DB engines that *do* support it, or at least, whose wire format is easy enough to recreate
[16:28:15] <mst> frew: your primary pain point is selects, right?
[16:28:33] <LeoNerd> Then the DBIx::Async->new() constructor can just optionally try one of those if it's available, or fallback to the "simple but slow" IaProcess wrapper
[16:28:41] <tom_m> LeoNerd: do you want to write your own from scratch, or would the existing code I have help? I don't want to spend the time on it if it's not going to be useful, that's all
[16:28:44] <frew> mst: well really, I want *everything* to be non-blocking, but as of now my primary pain point is that I cannot get notifications
[16:28:46] <LeoNerd> So you would have to write a DBD-equivalent for any kind of DB engine
[16:29:10] <LeoNerd> tom_m: Ah, yes at this point I think taking a look at that would be good :)
[16:29:11] <frew> mst: my db tends to be fast enough that nothing causes pain excpet that I can't use IO::Async to wait on table changes (which mssql can do)
[16:29:24] <tom_m> DBIx::Async already does this (the ::Worker class), so I could adapt NaPostgreSQL so there's at least one in-process example
[16:29:30] <mst> hah, right, so that's a completely different problem :D
[16:29:36] <gbjk> frew: You said ^^, and I don't know why.
[16:29:53] <mst> frew: as in, you want a callback from the db when a table's data gets changed
[16:29:58] <frew> gbjk: LeoNerd had a good explanation for where Futures and callbacks can be used, thoguth you might care
[16:30:16] <frew> mst: that's something I'd very much want, yes (which I think was the second thing I told riba)
[16:30:20] <ribasushi> mst: for anything I've worked with (which arguably is not much) select's were never a pain point
[16:30:26] <LeoNerd> gbjk: Oh, while you're here: you probably know Cassandra and CQL more than I do at this point.. Do you imagine that a DBIx::Async::DBD::Cassandra would ever be useful? Is CQL "close enough" to SQL that keeping it as a drop-in option in those simple cases where people don't do JOINs or whatever will work?
[16:30:32] <ribasushi> mst: so I suppose this is where the disconnect comes from
[16:31:08] <frew> though I do see value in having *everything* non-blocking, as that allows you to (sorta) keep running if your db goes down
[16:31:15] <LeoNerd> As in, I'd be interested to consider DBIx::Async->new( "cassandra:localhost:whatever", keyspace => "foo" )->insert( "columnfamily" => { columns => "go", here => "now" } )->get;
[16:32:45] <LeoNerd> Ohyes.. This is my other reason for wanting a specifically cut-down "you can't do any arbitrary SQL query" simple DB layer - it still works with things that aren't real SQL engines; like Cassandra and Dynamo and *gasp* BigTable, and whatever else
[16:32:56] <gbjk> LeoNerd: Yes, I believe that you could have a DBD::CQL. It's close enough to SQL that "buyer beware" should be enough.
[16:33:11] <gbjk> TTL is used lots, obviously.
[16:33:30] <LeoNerd> Mmmyes.. Things like TTL start to become useful there :)
[16:33:31] <mst> tom_m: so, experiences seem to be differing
[16:33:34] <mst> who's waiting for what?
[16:33:39] <gbjk> LeoNerd: We got *so* caught out by insert... using ttl 60; followed by update clearing the *column* ttl, so never expiring.
[16:33:44] <LeoNerd> Hah!
[16:33:46] <gbjk> TTL is all the win.
[16:33:58] <gbjk> But TTL per column actually being what you get is surprising.
[16:34:03] <mst> IME, the only time my DB stuff blocks to an extent I care about it is complex SELECTs
[16:34:15] <tom_m> mst: at this point I'm tempted to say "I'm not waiting for anything, 'cos I'm all async all the time" =) But seriously, there may be diferent enough use-cases that it's worth writing up a shortlist
[16:34:15] <LeoNerd> Given I want to generate lots of short-lived monitoring data, and do my own rolling downsampling, those TTLs would be very good for me
[16:34:23] <tom_m> for me, "streaming data from/to tables", listen/notify, and "slow select" are some of the main async cases, that first one particularly in ETL context where any part of the pipeline might be slow.
[16:34:27] <gbjk> mst: Welcome to ISAM backed ODBC, where *simple* SELECTs shit the bed.
[16:34:47] <gbjk> mst: select count(id) from resfil where resfil is just 1 million takes like 16 seconds.
[16:34:52] <tom_m> and "I have no control over the time taken by even the simplest of queries" is another async motivator
[16:35:01] <tom_m> ^ a bit like that
[16:35:16] <LeoNerd> For me, the primary use-case of async DBI would simply be that my spinning-rust is slow, and I don't want to block all of networking IO just waiting for a read(2) call to get the right piece of metal under the magnet
[16:35:38] <gbjk> Fundamentally, if you're writing async apps, you *never* want blocking.
[16:35:38] <mst> see, in my world inserts/updates/deletes are usually effectively free
[16:35:50] <gbjk> node.js somewhat learnt that lesson well early.
[16:35:56] <gbjk> mst: FSVO free.
[16:35:58] <tom_m> there's also various nice-to-haves that async DB layers give you - query consolidation, for example. Maybe those are out of scope at this stage.
[16:36:06] <mst> and I'd spend longer pissing about with event loop spin than I would waiting for them
[16:36:11] <gbjk> Probably the same "value of" that you can get free blowjobs from a prostitute.
[16:36:30] <LeoNerd> It's about latency and throughput, though
[16:36:32] <gbjk> mst: Sorry, I missed your lack of select. I completely agree.
[16:36:38] <gbjk> mst: Oh, wait.
[16:36:51] <LeoNerd> If you're doing a largely-bulk processing in which *every* operation hits the DB anyway, then you might as well just block
[16:36:56] <gbjk> mst: We have a 60 million record table that stores the last 6 hours of some stuff.
[16:37:16] <LeoNerd> But if you *can* perform lots of other network IO that is unrelated to database disk, then it's nice not to have the disk unfairly hold up the queue
[16:37:20] <gbjk> mst: The reason we just moved to cassandra is because updates block for up to 5 seconds every few hours when postgresql decides to groom itself a bit.
[16:37:27] <mst> gbjk: ahhh, right
[16:37:35] <LeoNerd> Hah!
[16:37:36] <gbjk> I was tempted to put up a picture of a dog licking it's balls when that hapepns.
[16:37:40] <gbjk> "I'm busy cleaning"
[16:37:57] <LeoNerd> I'd be curious to hear if you think Cassandra is *any* better-behaved than that
[16:38:00] <gbjk> But at 300 requests per second, that gets painful awfully fast.
[16:38:01] <LeoNerd> My experience of it was that it wasn't
[16:38:04] <gbjk> LeoNerd: It is.
[16:38:12] <gbjk> LeoNerd: Two secs I'll take a screenshot.
[16:40:23] <gbjk> http://i.imgur.com/DgHSPGs.png
[16:40:53] <LeoNerd> Hrm... how many nodes in that cluster?
[16:41:24] <gbjk> Actually, this is better. http://i.imgur.com/UqLizHP.png
[16:41:28] <gbjk> LeoNerd: Just 3.
[16:41:41] <LeoNerd> Hrm... I had three too
[16:42:01] <LeoNerd> Well,.. I say "three". I had two on one machine with two different IPs, as I didn't have three machines capable of *running* cassandra
[16:42:15] <LeoNerd> I did find a number of bugs in it because of that
[16:42:36] <LeoNerd> E.g. Despite having its IP address, each node uses the MAC address of its first ethernet card as its identity for the vector clock
[16:42:38] <gbjk> LeoNerd: We just did our first restart of all the cassandras live.
[16:42:42] <mst> gbjk: even so, it strikes me that for the majority of users, $rs->next_f and $rs->all_f would probably be 99% of what they needed
[16:42:53] <gbjk> mst: You're entirely right.
[16:42:55] <LeoNerd> ... so having VCs comprised of two identical addresses from two different nodes makes for Fun Times.
[16:43:07] <gbjk> LeoNerd: That's why I was talking to you about ports, etc.
[16:43:14] <mst> gbjk: and for the majority of cases of "complex insert/update/etc." a sidecar process would be good enough
[16:43:16] <gbjk> LeoNerd: I solved this with docker and RFC19 addresses.
[16:43:21] * LeoNerd nod
[16:43:22] <mst> also that way the txn problem basically goes away
[16:43:37] <LeoNerd> Provided they all have their own MAC address visible, it's likely fine
[16:43:40] <gbjk> LeoNerd: *But*, if you tell it all the servers and ports, and ignore the ones peers tells you, you can live with cassandras on different ports now.
[16:43:44] <mst> gbjk: ok, your 'updates are fast, except occasionally you roll a 1' case isn't quite so amenable to that, but that's not as common a case, right?
[16:43:48] * LeoNerd nod
[16:44:33] <gbjk> mst: Exactly. Sometimes all updates take 6 seconds isn't the same as "Every single fucking select like this will take 16-200 seconds"
[16:47:29] <ribasushi> right... at which point none of this has to do with DBIC internals anymore, as both the cursor class and the base resultset are extendable, and one only needs to hook the already existing APIs to provide something more sophisticated
[16:48:08] <ribasushi> (frew's DBIC::Helpers have many examples of exactly this kind of stuff)
[16:48:37] <frew> ribasushi: I'd be willing to make some async helpers if only as skunkworks
[16:49:00] <frew> though as we discussed, it's unlikely that it can work for me in prod (ODBC protocol, win32 + forks)
[16:51:15] <gbjk> frew: Fuck... that.
[16:51:24] <gbjk> I need ODBC for this to be viable.
[16:51:30] <gbjk> ( For my prod )
[16:52:05] <mst> it seems like using the hackery in Mojo::Pg to prototype next_f and all_f first might be a good way forwards
[16:52:29] <mst> then we can replace it with a proper async proto implementation later
[17:15:40] <frew> gbjk: I'm with you, just saying that those two things taken together already make me very much a minority
And then a tad later
[11:21:55] <quicksilver> ribasushi: can DBIC be coerced into wrking with any of the perl async libaries?
[11:23:25] <ribasushi> quicksilver: this shows up regularly as a question, and deadends into insufficient definition of "working"
[11:23:56] <ribasushi> quicksilver: since I am almost never in a situation that the RDBMS is a bottleneck, I can't "speak from experience" either
[11:24:34] <quicksilver> I must admit it's an idle question at the moment
[11:24:44] <quicksilver> we do have some slow queries
[11:24:45] <ribasushi> quicksilver: if all you want is to get the reads to happen async - you simply write a custom cursor class with whatever magic you need and tell DBIC to use that
[11:25:07] <quicksilver> it would be rather elegant to have a single perl process skilfully asyncing the slow queries
[11:25:14] <quicksilver> and delivering the results back to clients when they are ready
[11:25:28] <quicksilver> but it is not the only solution or necessarily the best :) idle thoughts.
[11:26:56] <ilmari> quicksilver: dakkar is doing that by sticking all the DBIC work in an IO::Async::Function, which just forks to do the work
[11:28:00] <quicksilver> ilmari: fork()'ing sounds slightly yuck :)
[11:28:13] <quicksilver> but I can see it solves a problem sometimes
[11:30:38] <quicksilver> ilmari: does he serialise it (JSON? Freeze) for return to the main process?
[11:30:46] <AndrewIsh> Hi guys. Does anyone know if dbic has any support for PostgreSQL's json datatype?
[11:30:59] <ribasushi> AndrewIsh: support in what regard?
[11:32:08] <ribasushi> quicksilver: there is also a rather well through through article by my "snakish twin" http://techspot.zzzeek.org/2015/02/15/asynchronous-python-and-databases/
[11:33:50] <ribasushi> the tldr (with which I largely agree): "async all the things" is misguided at best, since to be effective the async flow usually needs to be very closely tailored to the problem at hand, making it rather non-generalizable
[11:34:54] <quicksilver> yes
[11:35:22] <quicksilver> and if you need to imporve performance in some respect you need to understand your own bottlenecks
[11:35:52] <quicksilver> as it happens our bottleneck is not frontend cpu (so it's untrue that "perl is very slow compared to my database" for me)
[11:36:29] <quicksilver> our bottleneck is either (depending what we're doing) frontend memory or DB locks or DB CPU or DB IO
[11:38:13] <quicksilver> good blog post tho
[11:38:40] <ribasushi> o.O you already read it? :)
[11:39:40] <quicksilver> I skimmed it and read some paragraphs which seemed most relevant
[11:40:09] <quicksilver> there are loads of things I can do to improve our architecture
[11:40:23] <quicksilver> async DBIC would just be one facet of one possible approach but noce to know if it's possible
[11:42:14] <quicksilver> it's noce to be important but it's more important to be able to type.
[11:42:16] <ribasushi> quicksilver: from a technical perspective things are decoupled and overridable enough to make almost anything possible
[11:42:45] <ribasushi> quicksilver: it's the "how do you actually want this thing to look when confronted with corner cases" that is problematic
[11:42:53] <quicksilver> nod
Then *much later*
<jberger> I started reading just a couple minutes ago
<jberger> I thought oh ill just do it quickly now
<jberger> but it goes on forever
<ribasushi> :)
<ribasushi> basically - your "untrained eye" would be extremely useful for me here
<ribasushi> which is why I keep poking you to read it
<jberger> I'm actually in dc atm
<jberger> so I don't have large quantities of Foss time
<jberger> but I will
<jberger> sometime
<jberger> I promise
<ribasushi> ++
<jberger> if I may say though
<jberger> my idea has always had a bit of: let dbic generate the query, I'll run it and then had dbic back the raw results to unpack into objects
<jberger> hand back
<jberger> then again that may show my complete ignorance of the dbic internals/workflow
<ribasushi> this is roughly how it works internally yes, some of the hook-points are not exposed as well as we could have (yet)
<ribasushi> the point is - you already have the APIs to do that today (i.e. the generator for SELECTs is *entirely* decoupled)
<ribasushi> $rs->as_query works anywherew
<ribasushi> at which point "async DBIC" becomes even less clear as in "what do you want as a user anyway...?"
And a tad later again
<jberger> finally read it
* jberger wipes brow
<jberger> sorry it took so long
<jberger> ok, so I'm responding to "everyone wants async but no one knows what that looks like" yeah?
<ribasushi> more or less yeah
<ribasushi> that is - the *technical* part is easy, given how decoupled DBIC is
<ribasushi> but the *what* seems insurmountable
<jberger> then maybe what I want is doc
<jberger> is there some doc section that is titled "how do I execute the query manually? "
<ribasushi> not sure I follow - explain
<jberger> you said that there is an "as_query" method
<jberger> is there a similar "generate resultset from query result" method?
<jberger> and assuming that both of those exist, where is the documentation so that we can feature it
<ribasushi> you don't generate a resultset from a query - a query can only result in errr Result instances
<jberger> sorry, yes
<jberger> I used the wrong term there
<ribasushi> this is where it gets tricky - you can't just do blindly "here is a $sth, make me objects out of it"
<jberger> I just want to take over the part where the query is run
<ribasushi> because there needs to be out-of-band information "what is in this $sth"
<jberger> but if it can build me the query, doesn't it know what the response is likely to be?
<ribasushi> a ::Resultset instance holds the entire "query plan" *including* full metadata of what each column returned by an eventual resultset will hold
<ribasushi> there is a relatively clean internal codepath where you say "given these columns - assemble me objects"
<ribasushi> except it is not exposed as a 1st class API, as it isn't clear what you'd do if you had it
<ribasushi> so with that said:
<ribasushi> assume there is an API that does exactly what "you think it will do"
<ribasushi> how would you string it together?
<jberger> my $sql = $rs->as_query; $pg->query($sql, sub { my ($pg, $err, $res) = @_; $rs->import_results($res); #do something with results })
<jberger> in that way the transport is completely up to the user
<ribasushi> what is $res in this case? a $sth or an AoA ?
<jberger> whatever it is documented to need to be I suppose
<jberger> I would rather it be AoA
<ribasushi> wait - I mean $pg->query is a non-dbic thing isn't it?
<jberger> my $pg = Mojo::Pg->new
<ribasushi> hm hm hm, this is doable-ish
<jberger> :D
<ribasushi> one problem is DBIC needs to know what is it you are talking to, so it can generate you the correct $sql
<jberger> hmmmm, true
<jberger> does it do that at the last minute or is that something that can be configured
<ribasushi> that would be achievable with $schema = My::Schema->connect(sub { $pg->whatever_gives_back_dbh })
<ribasushi> (this also gives you proper dbh sharing)
<ribasushi> so that's not a big deal
<ribasushi> you know what... I think we are *almost* there anyway
* ribasushi thinks more
* ribasushi curses mst
<ribasushi> so
<ribasushi> I need to run now
<ribasushi> BUT
<ribasushi> the main mechanism would be
<jberger> (cursing mst is always fun)
<ribasushi> my $rs = <do everything as usual here>; $rs->{cursor} = Some::Object::Which::Implements( all(returns a list of arrayrefs), next (returns a *list* of values or ()), reset(re-issues the query, makes next() work again) ) ; $rs->all/next/first as usual
<ribasushi> the reason to do ->{cursor} is because there is no setter for it (historic reasons I guess)
<ribasushi> but the above is all you'd need to do really - you can give it a shot relatively easily
<jberger> looks reasonable
<ribasushi> if it works, and does what you expect - I will look into making this via a more official API
<ribasushi> sounds like a plan?
<jberger> yeah, I hope I get some time to try it soon
<jberger> I'm working today as it is
<ribasushi> ok, will wait for feedback, will poke you in a week if I don't hear anything
<jberger> (catching up from the vacation)
<ribasushi> are you familiar with the \[ $sql, @bind_tuples ] format of ->as_query?
<jberger> no
<ribasushi> the tuple format is listed here: https://metacpan.org/pod/DBIx::Class::ResultSet#DBIC-BIND-VALUES
<ribasushi> i.e. massaging will be needed to feed it to $pg, maybe as simple as grabbing [1] of every tuple
<jberger> that will likely be what I try first
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment