Skip to content

Instantly share code, notes, and snippets.

@aaronwolen
Last active November 14, 2023 21:35
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save aaronwolen/07759c0bcf0d32561b2eb0dd061e5c07 to your computer and use it in GitHub Desktop.
Save aaronwolen/07759c0bcf0d32561b2eb0dd061e5c07 to your computer and use it in GitHub Desktop.
Example using TileDB-R's batched reader API.
#! /usr/bin/env Rscript
library(tiledb)
library(data.table)
library(nycflights13)
TILEDB_URI <- file.path(tempdir(), "tiledb-flights")
if (dir.exists(TILEDB_URI)) unlink(TILEDB_URI, recursive = TRUE)
STATS_DIR <- file.path(getwd(), "stats")
dir.create(STATS_DIR, recursive = TRUE, showWarnings = FALSE)
fromDataFrame(
obj = flights,
uri = TILEDB_URI,
col_index = "time_hour",
sparse = TRUE
)
# restrict memory to force batched reads
set_allocation_size_preference(1024^2)
# enable tiledb stats
tiledb_stats_enable()
tdb <- tiledb_array(TILEDB_URI, query_layout = "UNORDERED")
# create a batched query
batched <- createBatched(tdb)
flights2 <- list()
i <- 1
while (!completedBatched(batched)) {
message("Batch ", i)
flights2[[i]] <- fetchBatched(tdb, batched)
tiledb_stats_dump(file.path(STATS_DIR, paste0("batch_", i, ".txt")))
i <- i + 1
# reset the stats after each query
tiledb_stats_reset()
}
flights2 <- data.table::rbindlist(flights2)
flights2
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment