Skip to content

Instantly share code, notes, and snippets.

@jhowliu
Last active August 29, 2015 14:15
Show Gist options
  • Save jhowliu/f24e3852d103ee9b1277 to your computer and use it in GitHub Desktop.
Save jhowliu/f24e3852d103ee9b1277 to your computer and use it in GitHub Desktop.
Note For MapReduce By Google, Inc.

Note For MapReduce Framework By Google Inc.

Introduction

Purpose

  • Process large amounts of raw data.(Web request log、crawled document)
  • Process various kinds of derived data。(Inverted index

Operators

  • Map:The Map function takes a series of key/value pairs, processes each, and generates zero or more output key/value pairs.
  • Reduce:The Reduce can iterate through the values that are associated with that key and produce zero or more outputs.

Use for Parallelize large computations easily and use re-execution as primary mechanism for fault tolerance.

The following pseudo-code example:

map(String key, String value):
// key: document name
// value: document contents
for each word w in value:
   EmitIntermdiate(w, "1");

reduce(String key, Iterator values):
// key: a word
// values: a list of counts
for each v in values:
   result += ParseInt(v);
Emit(AsString(result));

Note: The iterator allows us to handle lists of values that are too large to fit in the memory.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment