Skip to content

Instantly share code, notes, and snippets.

@prasanthj
Last active May 19, 2020 01:44
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save prasanthj/a463bc1c78b8aed57752e13e8cc39c56 to your computer and use it in GitHub Desktop.
Save prasanthj/a463bc1c78b8aed57752e13e8cc39c56 to your computer and use it in GitHub Desktop.
S3 sync
apt-get update
apt-get install -y python curl vim
curl -O https://bootstrap.pypa.io/get-pip.py
export PATH=~/.local/bin:$PATH
python get-pip.py --user
pip install awscli --upgrade --user
mkdir -p ~/.aws
vim ~/.aws/config
[default]
aws_access_key_id=<AWS-ACCESS-KEY-HERE>
aws_secret_access_key=<AWS-SECRET-KEY-HERE>
s3 =
max_concurrent_requests = 200
max_queue_size = 10000
multipart_threshold = 64MB
multipart_chunksize = 16MB
NOTE: the aws credentials above should have sufficient permissions to write to destination bucket and
should also have permissions for encrypting the data in destination bucket if SSE KMS is specified.
aws s3 sync s3://<SRC-BUCKET>/<PATH-PREFIX>/tpcds_bin_partitioned_orc_10000.db/ s3://<DEST-BUCKET>/<PATH-PREFIX>/tpcds_bin_partitioned_orc_10000.db/ --sse aws:kms --sse-kms-key-id <KMS-KEY-ID>
Completed 2.2 TiB/2.2 TiB (3.1 GiB/s) with 0 file(s) remaining
real 12m0.114s
user 13m29.787s
sys 5m44.783s
[1]+ Done time aws s3 sync s3://source/bucket/path/to/warehouse-id/warehouse/tablespace/managed/hive/tpcds_copy_orc_partitioned_10000.db/ s3://destination/bucket/path/to/warehouse-id/warehouse/tablespace/managed/hive/tpcds_copy_orc_partitioned_10000.db/ --sse aws:kms --sse-kms-key-id <destination-encryption-key>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment