Navigation Menu

Skip to content

Instantly share code, notes, and snippets.

@hammady
Created August 24, 2016 08:34
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save hammady/5c1b9b2f1b5eaa5849e5314d7b0b509a to your computer and use it in GitHub Desktop.
Save hammady/5c1b9b2f1b5eaa5849e5314d7b0b509a to your computer and use it in GitHub Desktop.
Clean your S3 logs
#!/bin/bash
echo "Prerequisites:"
echo "sudo apt-get install awscli"
echo "aws configure"
echo "Usage:"
echo "aws s3 ls s3://BUCKET/PREFIX/ | cut -c32- | ./clean-logs.sh"
bucket=SET_BUCKET_NAME
prefix=SET_PREFIX
# set bash verbose mode
set -x
# iterate on file list from stdin
while read l
do
# download from s3
aws s3 cp s3://$bucket/$prefix/$l .
# uncompress
gunzip $l
# get the uncompressed name
f=`basename $l .gz`
# do the actual cleaning, your mileage may vary
cat $f | grep -v ' heroku .*sample#' | grep -v ' app prediction' > tmp
mv tmp $f
# compress back the new file
gzip $f
# upload again to s3
aws s3 cp $l s3://$bucket/$prefix/$l
# cleanup
rm $l
done
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment