phonchi/AWS_EMR_Note.md

## AWS_EMR_Note.md

      
    Raw
  

              AWS_EMR_Note.md
            
          
We can adjust some hadoop setting like use config.json to adjust hadoop block size

[
  {
    "Classification": "hdfs-site",
    "Properties": {
      "dfs.blocksize": "67108864"
    }
  }
]

Choose HBase, Spark, Hadoop...and wait till cluster state change to  running
If we need to install many dependency, we have to modify the volume due to the size of default root partition is only 10GB. After modification, we should run sudo resize2fs /dev/xvda1 to
resize the root partition.


We may attempt to only copy the shared library to reduce the size of installation
The MapR distribution has setting to resize the EBS, while amazon ami does not.


Each node has built in aws cli, thus we can thus use EC2Box to run the installed scripts.


The steps does not work here, due to it only runs on the master node
Though we can use bootstrap action to run scripts on all nodes using custome jar :s3://us-east-1.elasticmapreduce/libs/script-runner/script-runner.jar
and argument s3://parallelvid/install.sh. It may corrupted the following cluster installations, since it runs before cluster setup.


Then, a setup script can run on master to setup HDFS, Hbase etc.


For example start the Hbase thrift server sudo -E /usr/lib/hbase/bin/hbase-daemon.sh start thrift -p 9097 --infoport 9098
The port may conflict other software, check log before procceed.


The emr distribution is based on Apach BigTop. You can find the related installed library under /usr/lib/