Skip to content

Instantly share code, notes, and snippets.

@phuedx
Last active August 29, 2015 14:17
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save phuedx/99fdb8b614d1ee8df9c1 to your computer and use it in GitHub Desktop.
Save phuedx/99fdb8b614d1ee8df9c1 to your computer and use it in GitHub Desktop.
Dumb WikiGrok claim aggregation script
import csv
import hashlib
raw_claims = open('claims.tsv', 'r')
claims = csv.DictReader(raw_claims, delimiter='\t')
aggregated_claims = {}
for claim in claims:
hash = hashlib.md5()
hash.update(claim['event_pageId'].encode('utf-8'))
hash.update(claim['event_propertyId'].encode('utf-8'))
hash.update(claim['event_subjectId'].encode('utf-8'))
hash.update(claim['event_valueId'].encode('utf-8'))
key = hash.hexdigest()
if key not in aggregated_claims:
aggregated_claims[key] = {}
aggregated_claims[key]['claim'] = claim
aggregated_claims[key]['votes'] = 0
aggregated_claims[key]['positive_votes'] = 0;
aggregated_claims[key]['votes'] += 1
if (claim['event_response'] == 'NULL'):
claim['event_response'] = 0
is_positive_vote = bool(int(claim['event_response']))
if is_positive_vote:
aggregated_claims[key]['positive_votes'] += 1
for key, aggregated_claim in aggregated_claims.items():
if aggregated_claim['votes'] >= 10:
print('-' * 10)
print('CLAIM: ' + str(aggregated_claim['claim']))
print('VOTES: ' + str(aggregated_claim['votes']))
print('POSITIVE_VOTES: ' + str(aggregated_claim['positive_votes']))
@phuedx
Copy link
Author

phuedx commented Mar 17, 2015

Y'all should take a look at phuedx/wikigrok-aggregation-test.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment