Skip to content

Instantly share code, notes, and snippets.

@matthewrobertbell
Last active December 16, 2015 05:09
Show Gist options
  • Save matthewrobertbell/5382821 to your computer and use it in GitHub Desktop.
Save matthewrobertbell/5382821 to your computer and use it in GitHub Desktop.
import urlparse
import collections
urls = (l.strip() for l in open('urls.txt') if len(l.strip()))
data = collections.defaultdict(set)
for url in urls:
domain = urlparse.urlparse(url).netloc
data[domain].add(url)
while True:
current_list = [data[k].pop() for k in data.keys() if len(data[k])]
if not len(current_list):
break
print current_list
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment