Skip to content

Instantly share code, notes, and snippets.

@crossi202
Last active January 24, 2016 10:12
Show Gist options
  • Save crossi202/a6f220cfc4f203382c86 to your computer and use it in GitHub Desktop.
Save crossi202/a6f220cfc4f203382c86 to your computer and use it in GitHub Desktop.

Setting up the number of mappers per job

Rationale

The number of mappers per job is a function of the number of blocks across all the files used as input for the mapreduce job. It could be necessary to setup explicitily the number of mappers per job when, for instance, the inputs are just references to files (and the input file containing the references occupies just one HDFS block).

Procedure

  • Add the following property to the mapred-site.xml configuration file on all the tasktrackers nodes of the Hadoop cluster:
<property><name>mapred.map.tasks</name><value>   </value></property>
  • Restart all the Tasktrackers.

Setting up the number of mappers per node

Rationale

It could be necessary to fix the number of mappers on a node. It happens where other parallelisation systems are used at node level (MPI for example).

Procedure

  • Add the following property to the mapred-site.xml configuration file on all the tasktrackers nodes of the Hadoop cluster:
<property><name>mapred.tasktracker.map.tasks.maximum</name><value>  </value></property>
  • Restart all the Tasktrackers.

Notes

Hadoop version: Hadoop 0.20 CDH3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment