Create a gist now

Instantly share code, notes, and snippets.

What would you like to do?
Download Umbrella's Top 1 Million Sites List For Last 2 Days And List New Sites
#!/bin/bash
day1=$(date --date="2 days ago" +"%Y"-"%m"-"%d")
day2=$(date --date="3 days ago" +"%Y"-"%m"-"%d")
mkdir -p ~/ut1m
cd ~/ut1m
#Get Yesterdays Data
printf "Getting $day1 Data\n"
cd ~/ut1m
mkdir $day1
cd $day1
wget -q http://s3-us-west-1.amazonaws.com/umbrella-static/top-1m-$day1.csv.zip
unzip top-1m-$day1.csv.zip > /dev/null
cut --complement -f 1 -d, top-1m.csv > $day1.csv
sort $day1.csv > $day1.txt
sed -r 's/\s+//g' $day1.txt > $day1-f.txt
sort $day1-f.txt > $day1-a.txt
#Get The Day Before Yesterday Data
printf "Getting $day2 Data\n"
cd ~/ut1m
mkdir $day2
cd $day2
wget -q http://s3-us-west-1.amazonaws.com/umbrella-static/top-1m-$day2.csv.zip
unzip top-1m-$day2.csv.zip > /dev/null
cut --complement -f 1 -d, top-1m.csv > $day2.csv
sort $day2.csv > $day2.txt
sed -r 's/\s+//g' $day2.txt > $day2-f.txt
sort $day2-f.txt > $day2-a.txt
#Find The Differences:
printf "Finding The Differences\n"
cd ~/ut1m
comm -3 $day1/$day1-a.txt $day2/$day2-a.txt > diff.txt
sed -r 's/\s+//g' diff.txt > newdomains.txt
#Upload to Spurnge
printf "Uploaded to Sprunge Here:\n"
cat newdomains.txt | curl -F 'sprunge=<-' http://sprunge.us
#Clean Up
#rm -rf *

daveio commented Apr 24, 2017

Modified version which won't rm -rf $PWD if run from anywhere but $HOME (or if $HOME/ut1m is a file): https://gist.github.com/daveio/83c1293b6449dcf67beeb90dbc6af836

Owner

jgamblin commented Apr 24, 2017

@daveio Thanks for the amazing fix!! I commented out the clean up line. As I said on I twitter I built and tested this quickly my code is usually not elegant but glad you caught out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment