crossi202/MapReduce Mappers.md

## MapReduce Mappers.md

      
    Raw
  

              MapReduce Mappers.md
            
          
    Setting up the number of mappers per job

Rationale

The number of mappers per job is a function of the number of blocks across all the files used as input for the mapreduce job. It could be necessary to setup explicitily the number of mappers per job when, for instance, the inputs are just references to files (and the input file containing the references occupies just one HDFS block).
Procedure


Add the following property to the mapred-site.xml configuration file on all the tasktrackers nodes of the Hadoop cluster:

<property><name>mapred.map.tasks</name><value>   </value></property>


Restart all the Tasktrackers.

Setting up the number of mappers per node

Rationale

It could be necessary to fix the number of mappers on a node. It happens where other parallelisation systems are used at node level (MPI for example).
Procedure


Add the following property to the mapred-site.xml configuration file on all the tasktrackers nodes of the Hadoop cluster:

<property><name>mapred.tasktracker.map.tasks.maximum</name><value>  </value></property>


Restart all the Tasktrackers.

Notes

Hadoop version: Hadoop 0.20 CDH3