Skip to content

Instantly share code, notes, and snippets.

View crossi202's full-sized avatar

Cesare Rossi crossi202

  • CGI Italia S.r.l.
  • Rome
View GitHub Profile

Setting up the number of mappers per job

Rationale

The number of mappers per job is a function of the number of blocks across all the files used as input for the mapreduce job. It could be necessary to setup explicitily the number of mappers per job when, for instance, the inputs are just references to files (and the input file containing the references occupies just one HDFS block).

Procedure

  • Add the following property to the mapred-site.xml configuration file on all the tasktrackers nodes of the Hadoop cluster: