Skip to content

Instantly share code, notes, and snippets.

@afonsoaugusto
Last active November 7, 2018 14:31
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save afonsoaugusto/bcedad395a830da06c7d5a48d7faa41a to your computer and use it in GitHub Desktop.
Save afonsoaugusto/bcedad395a830da06c7d5a48d7faa41a to your computer and use it in GitHub Desktop.

Patterns

Filtering * Sampling * Top-N * Simple Filter * Bloom Filter () * Sampling * Random Sampling * Top 10
Summarization * Couting * Min/Max * Statistics * Index Structural * Combining data sets

MapReduce Design Patterns

Quiz filtering:

    for line in reader:
        lines = re.split('\.| \! |\? \.\\n|\!\\n|\?\\n',line[4])
        if len(lines) == 1 :
            line[4] = lines[0] 
            writer.writerow(line)

Quiz Top 10

    lines = []
    for line in reader:
        line[4] = int(line[4])
        lines.append(line)
        
    from operator import itemgetter
    lines = sorted(lines, key=itemgetter(4))
    for line in lines[-10:]:
        writer.writerow(line)

Quiz inverted index

  https://github.com/duf59/ud617/blob/master/mapper5.py
  https://github.com/duf59/ud617/blob/master/reducer5.py
  cat forum_node.tsv | ./mapper.py | sort -k 1 | ./reducer.py  | grep -i fantastic

Quiz Finding Mean

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment