Skip to content

Instantly share code, notes, and snippets.

@darkarnium
Created December 5, 2016 03:34
Show Gist options
  • Save darkarnium/ab1b4b2bd276546e23d0032f6bdc2bc5 to your computer and use it in GitHub Desktop.
Save darkarnium/ab1b4b2bd276546e23d0032f6bdc2bc5 to your computer and use it in GitHub Desktop.
Fetch Alexa 'Top 1,000,000' site list and munge into a list of domains only.
#!/bin/bash
ALEXA_STATIC_1M="http://s3.amazonaws.com/alexa-static/top-1m.csv.zip"
echo 'Attempting to fetch Alexa Top 1M archive...'
curl -o top-1m.csv.zip -s $ALEXA_STATIC_1M
if [ $? -ne 0 ]; then
echo 'FAILED: Count not fetch file from remote server.'
exit -1
fi
echo 'Attempting to extract archive...'
unzip top-1m.csv.zip
if [ $? -ne 0 ]; then
echo 'FAILED: Could not extract CSV file from archive.'
exit -1
fi
echo 'Attempting to prepare list of domains...'
cut -d ',' -f 2 top-1m.csv > top-1m.txt
if [ $? -ne 0 ]; then
echo 'FAILED: Could not extract domains from list.'
exit -1
fi
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment