Created
October 29, 2020 18:17
-
-
Save rjurney/df4b37b6adef1646f95bcb0e81891ed8 to your computer and use it in GitHub Desktop.
Something is wrong, the local load takes longer than the S3 load from a bad connection
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# How can I be faster? | |
# Setup a session with credentials | |
boto3_session = BarUtils.boto_session( | |
aws_access_key_id=s3_key, | |
aws_secret_access_key=s3_secret, | |
) | |
df = wr.s3.read_parquet( | |
path=path, | |
dataset=True, | |
columns=columns + index_columns, | |
partition_filter=filters, | |
boto3_session=boto3_session, | |
use_threads=True, | |
) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# How can I be slower? I am Parquet v2. | |
dataset = pq.ParquetDataset( | |
path_or_paths=path, | |
filesystem=filesystem, | |
filters=filters, | |
use_legacy_dataset=False, | |
) | |
table = dataset.read_pandas( | |
columns=columns + index_columns, | |
use_threads=True, | |
) | |
df = table.to_pandas( | |
use_threads=True, | |
) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment