Skip to content

Instantly share code, notes, and snippets.

@tori-takashi
Created March 15, 2019 06:04
Show Gist options
  • Save tori-takashi/7451aff6554e7876d9b3800e683ecebe to your computer and use it in GitHub Desktop.
Save tori-takashi/7451aff6554e7876d9b3800e683ecebe to your computer and use it in GitHub Desktop.
inverted_index_reducer.py
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import sys
from collections import defaultdict
inverted_index = defaultdict(set)
def reduce(kv):
word, filename = kv.split('\t')
inverted_index[word].add(filename.strip())
if __name__ == "__main__":
for kv in sys.stdin:
reduce(kv)
for word, filename in inverted_index.items():
print('{0}\t{1}'.format(word, filename))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment