Skip to content

Instantly share code, notes, and snippets.

Steven Englehardt englehardt

Block or report user

Report or block englehardt

Hide content and notifications from this user.

Learn more about blocking users

Contact Support about this user’s behavior.

Learn more about reporting abuse

Report abuse
View GitHub Profile
englehardt /
Created Aug 16, 2019
A file for converting OpenWPM sqlite databases to parquet on S3. This also requires the appropriate `` file that matches the sqlite schema. See:
""" This script reads a sqlite database and writes the content to a parquet
database on S3 formatted as OpenWPM would format. It's best to just run this
on AWS as it bottlenecks on the S3 upload. This is a lightly modified version
of OpenWPM's S3Aggregator class.
import os
import sqlite3
import sys
from collections import defaultdict
englehardt /
Created May 14, 2019
Example of how to use the DisconnectParser included in `trackingprotection_tools` (
from trackingprotection_tools import DisconnectParser
BLOCKLIST_URL = '' # noqa
REMAPPING_URL = '' # noqa
dc = DisconnectParser(
englehardt /
Created Apr 18, 2019
Generate a list of safebrowsing hashes from the raw Disconnect list
import base64
import hashlib
import json
import re
import urllib2
from trackingprotection_tools import DisconnectParser
'Advertising', 'Analytics', 'Social', 'Content', 'Disconnect'
englehardt /
Last active Jul 26, 2018
A utility file to retrieve and parse the Alexa Top 1 Million site list
from StringIO import StringIO
import requests
import zipfile
import random
import json
import os
EC2_LIST = ''
englehardt /
Last active Nov 6, 2017
A requests-based crawler to gather internal links off of the homepage content of sites.
englehardt / Twitter-Remove_Likes.user.js
Last active Sep 7, 2018
Greasemonkey userscript to remove tweets from timeline which only show up because they were liked by someone you follow.
View Twitter-Remove_Likes.user.js
// ==UserScript==
// @name Remove Likes on Twitter
// @namespace twitter
// @include
// @version 2
// @grant GM_addStyle
// ==/UserScript==
GM_addStyle('div.promoted-tweet, div[data-component-context=suggest_activity_tweet] {display: none !important}');
from collections import defaultdict
import json
import dill
import os
DATA_DIR = './'
WEBXRAY_LIST = 'webxray_orgs.json'
DISCONNECT_LIST = 'disconnect_list.json'
OUT_LIST = 'merged_organizations.dill'
View organizations.json
"": [""],
"marketgid": ["", "", ""],
"madvertise": [""],
"voice2page": [""],
"mixpanel": [""],
"automattic": ["", "", "", "", "", ""],
"game advertising online": [""],
"adconion": ["", "", "", ""],
"sogou": ["", ""],
englehardt /
Created Sep 20, 2016
BlockListParser Utilities
This file contains a collection of utilities for working with BlockListParser
using http data, such as that collected by OpenWPM (
publicsuffix ( is required
Example usage:
from publicsuffix import PublicSuffixList
from BlockListParser import BlockListParser
englehardt /
Created Dec 17, 2015
Submit HTTP Authentication credentials with Selenium. Note that although the methods exist, Selenium doesn't seem to support native HTTP Auth handling in Firefox.
Steven Englehardt
Some dependencies (probably not exhaustive):
sudo apt-get install python-Xlib scrot xserver-xephyr
sudo pip install pyautogui pyvirtualdisplay
This needs access to a Firefox binary, and hardcodes a relative location.
You can’t perform that action at this time.