Follow instructions in Spring Cloud Data Flow Runtime - Deploying on YARN
For Spark 1.6.1 do the following
wget http://d3kbcqa49mib13.cloudfront.net/spark-1.6.1-bin-hadoop2.6.tgz tar xvzf spark-1.6.1-bin-hadoop2.6.tgz cd spark-1.6.1-bin-hadoop2.6 hadoop fs -mkdir -p /app/spark hadoop fs -chmod 777 /app/spark hadoop fs -copyFromLocal lib/spark-assembly-1.6.1-hadoop2.6.0.jar /app/spark/spark-assembly-1.6.1-hadoop2.6.0.jar
We are using a sample app from Spark distribution that we have compiled as part of a test suite. We are copying this file to HDFS.
wget https://repo.spring.io/snapshot/org/springframework/cloud/task/module/sparkapp-client-task/1.0.0.BUILD-SNAPSHOT/sparkapp-client-task-1.0.0.BUILD-SNAPSHOT-tests.jar hadoop fs -copyFromLocal sparkapp-client-task-1.0.0.BUILD-SNAPSHOT-tests.jar /app/spark/sparkapp-client-task-1.0.0.BUILD-SNAPSHOT-tests.jar
app register --type task --name spark-yarn --uri maven://org.springframework.cloud.task.app:spark-yarn-task:1.0.0.BUILD-SNAPSHOT task create spark1 --definition "spark-yarn --appName=my-test-pi --appClass=org.apache.spark.examples.JavaSparkPi --appJar=hdfs:///app/spark/sparkapp-client-task-1.0.0.BUILD-SNAPSHOT-tests.jar --sparkAssemblyJar=hdfs:///app/spark/spark-assembly-1.6.1-hadoop2.6.0.jar --appArgs=10"
The following properties must be specified for the test app:
-
appName
-
appClass
-
appJar
-
appArgs
We set them to the values corresponding to the test app we downloaded above
We also need to specify where the Spark assembly jar file is using the property
-
sparkAssemblyJar
Note
|
If you run this on a local server then you need to provide the Hadoop connection properties, here is an example for a Hadoop server named "borneo": --spring.hadoop.fsUri=hdfs://borneo:8020 --spring.hadoop.resourceManagerHost=borneo --spring.hadoop.resourceManagerPort=8032 --spring.hadoop.jobHistoryAddress=borneo:10020
|
task execution list
╔═════════╤══╤════════════════════════════╤════════════════════════════╤═════════╗
║Task Name│ID│ Start Time │ End Time │Exit Code║
╠═════════╪══╪════════════════════════════╪════════════════════════════╪═════════╣
║spark1 │1 │Tue Jun 07 19:32:27 EDT 2016│Tue Jun 07 19:32:52 EDT 2016│0 ║
╚═════════╧══╧════════════════════════════╧════════════════════════════╧═════════╝
We can now list the log for the container that ran the Spark app using comands on the Hadoop server.
$ more logs/userlogs/application_1465327822589_0008/container_1465327822589_0008_01_000001/stdout
Pi is roughly 3.146104
If log aggregation is enabled use the yarn logs
command to accomplish the same.
Note
|
If you are using a small single-node Hadoop cluster for testing you might have to increase the scheduler settings for max memory per app. We modified the capacity-scheduler.xml file and set the yarn.scheduler.capacity.maximum-am-resource-percent to 0.5 . We also modified yarn-site.xml and set yarn.nodemanager.resource.memory-mb to 8192 .
|