Created
August 28, 2013 18:27
-
-
Save dpenfoldbrown/6369488 to your computer and use it in GitHub Desktop.
Regex URLs to determine political leaning via labelled sources
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# List of urls (pretend like it's populated) | |
urls = [] | |
# Patterns to match in urls (note in some cases including the .org or .com to avoid matching common words or letters | |
# (eg for npr or slate or today) | |
# Add whatever other domains you want to match to the re OR (|) string | |
left_pattern = r"(?P<domain>nytimes|washingtonpost|npr.org|abcnews|nbcnews|huffingtonpost|slate.com|today.com)" | |
center_pattern = r"(?P<domain>cnn|bbc.co.uk|yahoo)" | |
right_pattern = r"(?P<domain>foxnews|washingtontimes|usnews|chicagotribune)" | |
# Keep counts in dictionary | |
affiliation_count = {"left": 0, "right": 0, "center": 0, "unknown": 0} | |
for url in urls: | |
if re.search(left_pattern, url): | |
# Add "left" annotation to URL object in database via PYMONGO | |
affiliation_count["left"] += 1 | |
elif re.search(right_pattern, url): | |
# Add "right" annotation to URL object in database via PYMONGO | |
affiliation_count["right"] += 1 | |
elif re.search(center_pattern, url): | |
# Add "center" annotation to URL object in database via PYMONGO | |
affiliation_count["center"] += 1 | |
else: | |
# Add "unknown" annotation to URL object in database via PYMONGO | |
affiliation_count["unknown"] += 1 | |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment