Skip to content

Instantly share code, notes, and snippets.

@rjurney
Created October 29, 2020 18:17
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save rjurney/df4b37b6adef1646f95bcb0e81891ed8 to your computer and use it in GitHub Desktop.
Save rjurney/df4b37b6adef1646f95bcb0e81891ed8 to your computer and use it in GitHub Desktop.
Something is wrong, the local load takes longer than the S3 load from a bad connection
# How can I be faster?
# Setup a session with credentials
boto3_session = BarUtils.boto_session(
aws_access_key_id=s3_key,
aws_secret_access_key=s3_secret,
)
df = wr.s3.read_parquet(
path=path,
dataset=True,
columns=columns + index_columns,
partition_filter=filters,
boto3_session=boto3_session,
use_threads=True,
)
# How can I be slower? I am Parquet v2.
dataset = pq.ParquetDataset(
path_or_paths=path,
filesystem=filesystem,
filters=filters,
use_legacy_dataset=False,
)
table = dataset.read_pandas(
columns=columns + index_columns,
use_threads=True,
)
df = table.to_pandas(
use_threads=True,
)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment