Skip to content

Instantly share code, notes, and snippets.

Forked from samklr/
Created July 1, 2016 10:55
Show Gist options
  • Save rohithreddy/a4b330a97db0b21f20f2c87c89584263 to your computer and use it in GitHub Desktop.
Save rohithreddy/a4b330a97db0b21f20f2c87c89584263 to your computer and use it in GitHub Desktop.
Setup HDFS on Mesos, Run Spark Cluster dispatcher via Marathon

Setup Mesos-DNS

Scripts for setting up

sudo mkdir /etc/mesos-dns
sudo vi /etc/mesos-dns/config.json


# replace zk:/ with IP of master
  "zk": "zk://",
  "refreshSeconds": 60,
  "ttl": 60,
  "domain": "mesos",
  "port": 53,
  "resolvers": ["",""],
  "timeout": 5,
  "email": "root.mesos-dns.mesos"
sudo docker pull mesosphere/mesos-dns
sudo docker run --net=host -d -v "/etc/mesos-dns/config.json:/config.json" mesosphere/mesos-dns /mesos-dns -config=/config.json
docker run -d --name mesos-dns -p 53:53/udp -v /etc/mesos-dns/config.json:/config.json mesosphere/mesos-dns /mesos-dns -v 2 -config=config.json

sudo sed -i "1s/^/nameserver $(hostname -i)\n/" /etc/resolv.conf
sudo sed -i "1s/^/prepend domain-name-servers $(hostname -i);\n/" /etc/dhcp/dhclient.conf


sudo docker run --net=host tutum/dnsutils dig
sudo docker run --net=host tutum/dnsutils dig master.mesos

Build HDFS

clone the project git clone

There are a couple of ways of making the configuration changes we need.

copy all of the XML files from hdfs/example-conf/mesosphere-dcos/ to hdfs/conf

modify conf/mesos-site.xml

  • set mesos.hdfs.native-hadoop-binaries to false
  • set mesos.native.library to /usr/local/lib/


    <description>True if you have pre installed hadoop binairies</description>

from hdfs directory, build ./bin/build-hdfs

scp the tarball to the master scp hdfs-mesos-0.1.1.tgz root@$MESOS_MASTER:~

ssh to the master ssh root@$MESOS_MASTER

untar hdfs tar zxvf hdfs-mesos-*.tgz

start HDFS cd hdfs-mesos-0.1.1 ./bin/hdfs-mesos

###Purge pre installed hadoop binaries on the slaves and the master if necessary sudo aptitude purge hadoop hadoop-yarn hadoop-hdfs hadoop-hdfs-namenode hadoop-hdfs-datanode hadoop-0.20-mapreduce hadoop-0.20-mapreduce-jobtracker hadoop-0.20-mapreduce-tasktracker hadoop-mapreduce

###Delete the hadoop directories on the slaves sudo rm -rf /etc/hadoop /mnt/hdfs /var/lib/hadoop* var/log/hadoop

###Check at the http://$(actinamenode_ip):50070 to see if it's running

try hadoop fs -ls hdfs://hdfs// hadoop fs -put file.txt hdfs://hdfs

Load in some data. hadoop

Run spark jobs.

Spark terasort

###Testing for Failover

###Run DFSIO First to test great I/O : (Needs Map reduce installed)

###Run HiBench :

###Push it further

####Constraint with hdfs ...

####Resources Reservation for the hdfs framework

####Constraint with Spark ...

####Launch both (Spark + HDFS) with in the same rack. Use marathon here. Makes more sense.

####Launch TPC-H via SparkSQL + TPCDS Impala

if ["$#" -ne 1]; then
echo "cript needs a json file as argument"
exit 1;
curl -X POST -H "Content-Type: application/json" -d@"$@"
#http POST mesos.master:8080/v2/apps < @"$@"
"id": "spark-dispatcher",
"cpus": 2,
"mem": 2048,
"instances": 1,
"cmd": "mv /mnt/mesos/sandbox/ conf/ && ./bin/spark-class org.apache.spark.deploy.mesos.MesosClusterDispatcher --port $PORT0 --webui-port $PORT1 --master mesos:// --zk --host $HOST --name spark",
"uris": [
"ports": [
"type" : "DOCKER",
"docker": {
"image" : "mesosphere/spark:1.5.0-rc2-hdfs", ## update to released 1.5
"healthchecks" : [
"portIndex" : 1,
"protocol" : "HTTP",
"gracePeriodSeconds" : 5,
"intervalSeconds" : 60,
"timeoutSeconds" : 10,
"id": "spark-mesos-dispatcher",
"cpus": 2,
"mem": 2048,
"instances": 1,
"cmd": "mv /mnt/mesos/sandbox/ conf/ && ./bin/spark-class org.apache.spark.deploy.mesos.MesosClusterDispatcher --port $PORT0 --webui-port $PORT1 --master mesos://zk://master.mesos:2181/mesos --zk master.mesos:2181 --host $HOST --name spark",
"uris": [
"ports": [
"type" : "DOCKER",
"docker": {
"image" : "mesosphere/spark:1.5.0-rc2-hdfs",
"healthchecks" : [
"portIndex" : 1,
"protocol" : "HTTP",
"gracePeriodSeconds" : 5,
"intervalSeconds" : 60,
"timeoutSeconds" : 10,
"id": "spark-notebook",
"cpus": .5,
"mem": 3500,
"instances": 1,
"type" : "DOCKER",
"docker": {
"image" : "andypetrella/spark-notebook:0.6.1-scala-2.10.4-spark-1.5.0-hadoop-2.6.0-cdh5.4.4-with-hive-with-parquet",
"healthChecks": [
"protocol": "HTTP",
"portIndex": 0,
"path": "/",
"gracePeriodSeconds": 5,
"intervalSeconds": 20,
"maxConsecutiveFailures": 3
"ports": [0,0]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment