Skip to content

Instantly share code, notes, and snippets.

@hedgehoggski
Last active August 29, 2015 13:57
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save hedgehoggski/9632541 to your computer and use it in GitHub Desktop.
Save hedgehoggski/9632541 to your computer and use it in GitHub Desktop.
This relates to
https://gist.github.com/johntyree/3331662
a script by @johntyree
The mac version of the command to grab blocklists
(see comments)
Hi folks. I've analysed the code for @fortran01's mac version (thanks v. much fortran01)
and posted a plain english translation of what each bit does below.
Hope it's accurate but if I have made any errors please correct me! :-)
Hope this helps people tweak the code to do other similar stuff
For more clarity as to what's going on, I've put each piped segment of the command on seperate lines,
but remember it's all one big line.
curl -s https://www.iblocklist.com/lists.php
| sed -n "s/.*value='\(http:.*=bt_.*\)'.*/\1/p"
| sed "s/\&/\&/g"
| sed "s/http/\"http/g"
| sed "s/gz/gz\"/g"
| xargs curl -L
| gunzip
| egrep -v '^#' > ~/Library/Application\ Support/Transmission/blocklists/generated.txt.bin
**Plain english Explanation of each bit of the command does:**
grab the webpage "https://www.iblocklist.com/lists.php" in silent mode (no progress bar or error messages)
search each line of this webpage, looking for lines containing text of the form
(anything)value=http:(anything)=bt_(anything)
chop out and dump the first bit ( (anything)value= ) from each of the lines you find
in the resultant lines, change all occurrences of &amp to &
in the resultant lines change all occurrences of the string http to "http
in the resultant lines, change all occurrences of the string gz to gz"
feed the resultant lines one by one to the curl command (-L means curl will automatically redo the grab if server says any file resource has moved)
feed each file downloaded by curl to gunzip program (uncompress it)
write only the lines from each file that don't start with a # (i.e. that are not comments) into the file
"~/Library/Application Support/Transmission/blocklists/generated.txt.bin"
All this ultimately results in the following lines being fed one by one by xargs to the curl command:
"http://list.iblocklist.com/?list=bt_level1&fileformat=p2p&archiveformat=gz"
"http://list.iblocklist.com/?list=bt_level2&fileformat=p2p&archiveformat=gz"
"http://list.iblocklist.com/?list=bt_level3&fileformat=p2p&archiveformat=gz"
"http://list.iblocklist.com/?list=bt_edu&fileformat=p2p&archiveformat=gz"
"http://list.iblocklist.com/?list=bt_rangetest&fileformat=p2p&archiveformat=gz"
"http://list.iblocklist.com/?list=bt_bogon&fileformat=p2p&archiveformat=gz"
"http://list.iblocklist.com/?list=bt_ads&fileformat=p2p&archiveformat=gz"
"http://list.iblocklist.com/?list=bt_spyware&fileformat=p2p&archiveformat=gz"
"http://list.iblocklist.com/?list=bt_proxy&fileformat=p2p&archiveformat=gz"
"http://list.iblocklist.com/?list=bt_templist&fileformat=p2p&archiveformat=gz"
"http://list.iblocklist.com/?list=bt_microsoft&fileformat=p2p&archiveformat=gz"
"http://list.iblocklist.com/?list=bt_spider&fileformat=p2p&archiveformat=gz"
"http://list.iblocklist.com/?list=bt_hijacked&fileformat=p2p&archiveformat=gz"
"http://list.iblocklist.com/?list=bt_dshield&fileformat=p2p&archiveformat=gz"
This should help clarify exactly what pattern is being searched for by sed and what the sed filtering actually does
e.g.
"&amp" becomes "&"
inverted commas placed before each http and after each gz
You could of course feed these lines manually to curl if you just want to grab individual zipped blocklists etc.
If doing this, don't forget to filter out the comment lines with egrep and write to a .bin file for transmission :-)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment