-
-
Save arturmkrtchyan/5d8559b2911ac951d34a to your computer and use it in GitHub Desktop.
curl http://spark-cluster-ip:6066/v1/submissions/status/driver-20151008145126-0000 |
{ | |
"action" : "SubmissionStatusResponse", | |
"driverState" : "FINISHED", | |
"serverSparkVersion" : "1.5.0", | |
"submissionId" : "driver-20151008145126-0000", | |
"success" : true, | |
"workerHostPort" : "192.168.3.153:46894", | |
"workerId" : "worker-20151007093409-192.168.3.153-46894" | |
} |
curl -X POST http://spark-cluster-ip:6066/v1/submissions/kill/driver-20151008145126-0000 |
{ | |
"action" : "KillSubmissionResponse", | |
"message" : "Kill request for driver-20151008145126-0000 submitted", | |
"serverSparkVersion" : "1.5.0", | |
"submissionId" : "driver-20151008145126-0000", | |
"success" : true | |
} |
curl -X POST http://spark-cluster-ip:6066/v1/submissions/create --header "Content-Type:application/json;charset=UTF-8" --data '{ | |
"action" : "CreateSubmissionRequest", | |
"appArgs" : [ "myAppArgument1" ], | |
"appResource" : "file:/myfilepath/spark-job-1.0.jar", | |
"clientSparkVersion" : "1.5.0", | |
"environmentVariables" : { | |
"SPARK_ENV_LOADED" : "1" | |
}, | |
"mainClass" : "com.mycompany.MyJob", | |
"sparkProperties" : { | |
"spark.jars" : "file:/myfilepath/spark-job-1.0.jar", | |
"spark.driver.supervise" : "false", | |
"spark.app.name" : "MyJob", | |
"spark.eventLog.enabled": "true", | |
"spark.submit.deployMode" : "cluster", | |
"spark.master" : "spark://spark-cluster-ip:6066" | |
} | |
}' |
{ | |
"action" : "CreateSubmissionResponse", | |
"message" : "Driver successfully submitted as driver-20151008145126-0000", | |
"serverSparkVersion" : "1.5.0", | |
"submissionId" : "driver-20151008145126-0000", | |
"success" : true | |
} |
Hi,
Is it possible to submit a py file as well?
In this part:
"sparkProperties" : {
"spark.submit.pyFiles": "/path/to/py/file/file.py",
In submit_job.sh the submitted task will not have correct links in Spark Mater UI. For instance instead of Streaming button to go to http://master/proxy/app-id/streaming
it goes to http://master/streaming
which goes just to the main page of spark master. In order to solve this I checked the spark source code and if you add these two config lines to the sparkProperties object, links will work and graphs will get presented correctly:
"spark.ui.reverseProxy": "true"
"spark.ui.reverseProxyUrl": "proxy"
The spark source code section that helped.
Although I am using this, I am not fully sure about the consequences. I assume that spark-submit is also (implicitly) setting these two configs while with having these two config lines, running through hidden api acts just like spark-submit. But still take a look at Spark Configuration for these two configs, just to make sure you know if they impact anything.
Anyone know if it's possible to recover the payload used to submit a spark job? We have an issue where some of our spark stuff intermittently fails in mesos, and it'd be nice if we could resubmit the same payload (it's like, 1% or less of our jobs that fail this way). To do this, we need the json used to create the job, but I would like to do it without keeping a database somewhere mapping driver id to submit data.
Dear experts,
Thanks for your useful information regarding the Hidden REST API of Spark.
We are having a cluster of Spark, configured to provide high availability.
As a result, the Hidden REST API should be like spark://master1:6066,master2:6066
So in this case, what would be the best practice of using the Hidden REST API?
Best
Tien Dat
We found that we had to set the following property in order to enable the Spark Rest API
spark.master.rest.enabled true
We found that we had to set the following property in order to enable the Spark Rest API
spark.master.rest.enabled true
Where did you set this property at?
@damienn set it at spark-defaults.conf
file for standalone Spark
How to pass application arguments and conf from these APIs
./bin/spark-submit
--class
--master
--deploy-mode
--conf =
... # other options[application-arguments]
Hi, is there any solutions with it?
Hi, is there any solutions with it?
I resolved this by inspecting the source code, and found directly appending it inside the node sparkProperties
will work fine.
like:
"sparkProperties" : {
"spark.foo.bar" : "xxx"
}
The API CreateSubmissionRequest does not push the pyfile to the worker. How do I fix it?
The API CreateSubmissionRequest does not push the pyfile to the worker. How do I fix it?
The API call does not pass anything except Spark configuration, files like py, jar have to be present in all Spark workers, you can distribute files to all workers or use NFS.
When running spark in kubernetes (one pod for the master and one pod for the worker), every time I submit an application via the API to the master, the app runs on the worker but the driver exits with failure every time. Anyone any idea why this might be? If I run the traditional submit script on one of the worker pods it works fine.
I am facing the same scenario as well @LasseJacobs , can anyone share us the exact curl command to submit a python file.
When running spark in kubernetes (one pod for the master and one pod for the worker), every time I submit an application via the API to the master, the app runs on the worker but the driver exits with failure every time. Anyone any idea why this might be? If I run the traditional submit script on one of the worker pods it works fine.
@LasseJacobs - Did you manage to fix this issue?
Yes I was able to get it working in the end, I don't remember exactly but I think I needed to add "spark.driver.supervise": "true"
. Here is my full shell script to try it out:
curl -X POST http://cluster:6066/v1/submissions/create --header "Content-Type:application/json;charset=UTF-8" --data '{
"appResource": "file:///path/to/file.jar",
"sparkProperties": {
"spark.executor.memory": "2g",
"spark.master": "spark://spark-master:7077",
"spark.driver.memory": "2g",
"spark.driver.cores": "1",
"spark.eventLog.enabled": "false",
"spark.app.name": "Spark REST API - PI",
"spark.submit.deployMode": "cluster",
"spark.jars" : "file:///opt/bitnami/spark/apps/dwh-plumber-1.0.jar",
"spark.driver.supervise": "true"
},
"clientSparkVersion": "3.2.0",
"mainClass": "App",
"environmentVariables": {
"SPARK_ENV_LOADED": "1"
},
"action": "CreateSubmissionRequest",
"appArgs": [ "" ]
}'
Let me know if this works for you or not, if it doesn't work I will remove it because I am not sure anymore if this was the only thing we had to do to get it to work.
Also, the rest API has to be enabled. Here is an example config:
apiVersion: v1
kind: ConfigMap
metadata:
name: spark-master-config
data:
spark-defaults.conf: |
spark.master.rest.enabled true
spark.driver.host spark-master
spark.driver.port 7077
and then mount the volume:
volumeMounts:
- name: config-volume
mountPath: /opt/bitnami/spark/conf/spark-defaults.conf
subPath: spark-defaults.conf
@LasseJacobs Hi, how can we update the values.yaml to do so. I can't get it right, since the current empty dir volumemount is conflicting.
How to pass application arguments and conf from these APIs
./bin/spark-submit
--class
--master
--deploy-mode
--conf =
... # other options
[application-arguments]