I have a database logging operation that takes about 1/3s - 1/2s. I do lots of these logging operations. I do enough of them that I am worried it could be a bottleneck in whether my app has a snappy feel to it.
bench::mark(log_event(con, schema=schema, prefix = prefix, event = "benchmark", session = "session"))
# # A tibble: 1 x 14
# expression min mean median max `itr/sec` mem_alloc n_gc n_itr total_time result memory time
# <chr> <bch> <bch> <bch:> <bch> <dbl> <bch:byt> <dbl> <int> <bch:tm> <list> <list> <lis>
# 1 "log_even… 376ms 379ms 379ms 382ms 2.64 44.5KB 0 2 758ms <data… <Rpro… <bch…
# # ... with 1 more variable: gc <list>
I wanted to explore if throwing that logging operation into another thread would speed things up. After all, I only need:
- the connection object
- the query
Two approaches here:
- callr (except it does not support serializing a DBI::connection object)
- future / promises (which are supported well in Shiny)
Faster, but no dice
explore2 <- function(query) {
callr::r_bg(function() {Sys.sleep(1); print(query)})
}
# results of bench::mark
# # A tibble: 1 x 14
# expression min mean median max `itr/sec` mem_alloc n_gc n_itr total_time result memory
# <chr> <bch:> <bch:> <bch:> <bch:> <dbl> <bch:byt> <dbl> <int> <bch:tm> <list> <list>
# 1 {... 27.3ms 45.7ms 43.7ms 70.7ms 21.9 94.1KB 0 12 548ms <S3: … <Rpro…
# # ... with 2 more variables: time <list>, gc <list>
I suspect this is largely because there is no object copying / etc. from the parent session, just a naked R session startup, offloading things.
Of course, this doesn't work because I cannot serialize the connection. I guess I could pass the connection parameters, but that will get annoying... (still... running in 1/10th of the time)
There are some thoughts about using callr::r_session$new
, but it has some limitations as well.
Faster... but not worth the effort. The magnitude of the time taken is about the same, so little benefit, and definitely not enough to warrant the added complexity.
explore <- function(query, con) {
future({Sys.sleep(1); DBI::dbGetQuery(con, query)})
}
# results of bench::mark
# # A tibble: 1 x 14
# expression min mean median max `itr/sec` mem_alloc n_gc n_itr total_time result memory time
# <chr> <bch> <bch> <bch:> <bch> <dbl> <bch:byt> <dbl> <int> <bch:tm> <list> <list> <lis>
# 1 {... 223ms 229ms 224ms 240ms 4.37 55.6KB 0 3 687ms <S3: … <Rpro… <bch…
# # ... with 1 more variable: gc <list>
I would love to either have:
r_bg
be able to serialize a connection object- Be able to run a persistent background process (that can be restarted / etc. if it fails) that I can throw my async processes to (in a sort of queue, if you will). I made such a request, and we shall see what comes of it!
For now, I will probably just stay synchronous. Easier to reason about, and not a whole lot slower than the only functional alternative I have come up with. 1/3s-1/2s does not seem to be slow enough to gain much benefit from asynchronous code in R today. Either that or I will codify my connection so that the async process can start a connection itself. That's a lot of overhead for each background process (creating and dropping a db connection), though.