Skip to content

Instantly share code, notes, and snippets.

@cboettig
Last active September 20, 2023 00:52
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save cboettig/69d0dce2e91a2086b8a172abc0bebe55 to your computer and use it in GitHub Desktop.
Save cboettig/69d0dce2e91a2086b8a172abc0bebe55 to your computer and use it in GitHub Desktop.
## Using Forecasts data, ~ 300+ GB in many very small partitions!
uri <- "s3://anonymous@bio230014-bucket01/neon4cast-forecasts/parquet/aquatics?endpoint_override=sdsc.osn.xsede.org"
bench::bench_time(df <- duckdbfs::open_dataset(uri))
# process real
# 1.23m 4.91m
bench::bench_time( df <- arrow::open_dataset(uri))
# process real
# 4.21m 32.46m
## Using scores data, ~ 10 GB
uri <- "s3://anonymous@bio230014-bucket01/neon4cast-scores/parquet/aquatics?endpoint_override=sdsc.osn.xsede.org"
bench::bench_time(df <- duckdbfs::open_dataset(uri))
# process real
# 2.66s 10.81s
bench::bench_time( df <- arrow::open_dataset(uri))
# process real
# 12s 41.5s
uri <- "s3://anonymous@us-west-2.opendata.source.coop/eco4cast/neon4cast-scores/parquet/aquatics?region=us-west-2"
bench::bench_time(
df <- duckdbfs::open_dataset(uri)
)
# process real
# 4.95s 19.6s
bench::bench_time(
df <- arrow::open_dataset(uri)
)
# process real
# 36.27s 1.48m
uri <- "s3://neon4cast-scores/parquet/aquatics?endpoint_override=data.ecoforecast.org"
bench::bench_time(
df <- duckdbfs::open_dataset(uri)
)
# process real
# 4.3s 42.7s
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment