Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
Comparing the time that a query takes to run using three different models of database connection: one connection per program/app; one connection per query; using a connection pool.
---
title: "Benchmarking database queries"
author: "Barbara Borges Ribeiro"
date: "June 23, 2016"
output: html_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
library(DBI)
library(pool)
```
### One connection per app
```{r app}
app_times <- numeric(0)
conn <- DBI::dbConnect(
drv = RMySQL::MySQL(),
dbname = "shinydemo",
host = "shiny-demo.csa7qlmguqrf.us-east-1.rds.amazonaws.com",
username = "guest",
password = "guest"
)
for (id in 1:30) {
time <- system.time({
sql <- paste0("SELECT * FROM City WHERE ID = ", id, ";")
dbGetQuery(conn, sql)
})
app_times <- c(app_times, time["elapsed"])
}
invisible(DBI::dbDisconnect(conn))
```
The average of these 30 measurements is `r mean(app_times)` seconds.
### One connection per query
```{r query}
query_times <- numeric(0)
args <- list(
drv = RMySQL::MySQL(),
dbname = "shinydemo",
host = "shiny-demo.csa7qlmguqrf.us-east-1.rds.amazonaws.com",
username = "guest",
password = "guest"
)
for (id in 1:30) {
time <- system.time({
conn <- do.call(DBI::dbConnect, args)
sql <- paste0("SELECT * FROM City WHERE ID = ", id, ";")
dbGetQuery(conn, sql)
DBI::dbDisconnect(conn)
})
query_times <- c(query_times, time["elapsed"])
}
```
The average of these 30 measurements is `r mean(query_times)` seconds.
### Using pool
```{r pool}
pool_times <- numeric(0)
pool <- dbPool(
drv = RMySQL::MySQL(),
dbname = "shinydemo",
host = "shiny-demo.csa7qlmguqrf.us-east-1.rds.amazonaws.com",
username = "guest",
password = "guest"
)
for (id in 1:30) {
time <- system.time({
sql <- paste0("SELECT * FROM City WHERE ID = ", id, ";")
dbGetQuery(pool, sql)
})
pool_times <- c(pool_times, time["elapsed"])
}
poolClose(pool)
```
The average of these 30 measurements is `r mean(pool_times)` seconds.
## Overall comparison
```{r times}
(times <- data.frame(app_times, query_times, pool_times))
```
As you can see, the app-level measurements are the fastest. This is no suprise. The pool measurements take a little over double that time, since you still do have to fetch and return the connection to the pool (in addition to calculating the query itself). Finally, the query-level measurements are the longest. Again, this makes sense because for each query, you're actually fetching a connection from the remote database.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment