Skip to content

Instantly share code, notes, and snippets.

@arturmkrtchyan
Last active October 22, 2024 05:45
Show Gist options
  • Save arturmkrtchyan/5d8559b2911ac951d34a to your computer and use it in GitHub Desktop.
Save arturmkrtchyan/5d8559b2911ac951d34a to your computer and use it in GitHub Desktop.
Apache Spark Hidden REST API
curl http://spark-cluster-ip:6066/v1/submissions/status/driver-20151008145126-0000
{
"action" : "SubmissionStatusResponse",
"driverState" : "FINISHED",
"serverSparkVersion" : "1.5.0",
"submissionId" : "driver-20151008145126-0000",
"success" : true,
"workerHostPort" : "192.168.3.153:46894",
"workerId" : "worker-20151007093409-192.168.3.153-46894"
}
curl -X POST http://spark-cluster-ip:6066/v1/submissions/kill/driver-20151008145126-0000
{
"action" : "KillSubmissionResponse",
"message" : "Kill request for driver-20151008145126-0000 submitted",
"serverSparkVersion" : "1.5.0",
"submissionId" : "driver-20151008145126-0000",
"success" : true
}
curl -X POST http://spark-cluster-ip:6066/v1/submissions/create --header "Content-Type:application/json;charset=UTF-8" --data '{
"action" : "CreateSubmissionRequest",
"appArgs" : [ "myAppArgument1" ],
"appResource" : "file:/myfilepath/spark-job-1.0.jar",
"clientSparkVersion" : "1.5.0",
"environmentVariables" : {
"SPARK_ENV_LOADED" : "1"
},
"mainClass" : "com.mycompany.MyJob",
"sparkProperties" : {
"spark.jars" : "file:/myfilepath/spark-job-1.0.jar",
"spark.driver.supervise" : "false",
"spark.app.name" : "MyJob",
"spark.eventLog.enabled": "true",
"spark.submit.deployMode" : "cluster",
"spark.master" : "spark://spark-cluster-ip:6066"
}
}'
{
"action" : "CreateSubmissionResponse",
"message" : "Driver successfully submitted as driver-20151008145126-0000",
"serverSparkVersion" : "1.5.0",
"submissionId" : "driver-20151008145126-0000",
"success" : true
}
@viktor-pecheniuk
Copy link

@damienn set it at spark-defaults.conf file for standalone Spark

@arjenzhou
Copy link

How to pass application arguments and conf from these APIs

./bin/spark-submit
--class
--master
--deploy-mode
--conf =
... # other options

[application-arguments]

Hi, is there any solutions with it?

@arjenzhou
Copy link

arjenzhou commented Nov 24, 2020

Hi, is there any solutions with it?

I resolved this by inspecting the source code, and found directly appending it inside the node sparkProperties will work fine.

like:

"sparkProperties" : {
    "spark.foo.bar" : "xxx"
}

@logunv
Copy link

logunv commented May 14, 2021

The API CreateSubmissionRequest does not push the pyfile to the worker. How do I fix it?

@misurin
Copy link

misurin commented Jun 16, 2021

The API CreateSubmissionRequest does not push the pyfile to the worker. How do I fix it?

The API call does not pass anything except Spark configuration, files like py, jar have to be present in all Spark workers, you can distribute files to all workers or use NFS.

@LasseJacobs
Copy link

When running spark in kubernetes (one pod for the master and one pod for the worker), every time I submit an application via the API to the master, the app runs on the worker but the driver exits with failure every time. Anyone any idea why this might be? If I run the traditional submit script on one of the worker pods it works fine.

@Chethu7781
Copy link

I am facing the same scenario as well @LasseJacobs , can anyone share us the exact curl command to submit a python file.

@ardesai
Copy link

ardesai commented Feb 21, 2023

When running spark in kubernetes (one pod for the master and one pod for the worker), every time I submit an application via the API to the master, the app runs on the worker but the driver exits with failure every time. Anyone any idea why this might be? If I run the traditional submit script on one of the worker pods it works fine.

@LasseJacobs - Did you manage to fix this issue?

@LasseJacobs
Copy link

LasseJacobs commented Feb 21, 2023

Yes I was able to get it working in the end, I don't remember exactly but I think I needed to add "spark.driver.supervise": "true". Here is my full shell script to try it out:

curl -X POST http://cluster:6066/v1/submissions/create --header "Content-Type:application/json;charset=UTF-8" --data '{
  "appResource": "file:///path/to/file.jar",
  "sparkProperties": {
    "spark.executor.memory": "2g",
    "spark.master": "spark://spark-master:7077",
    "spark.driver.memory": "2g",
    "spark.driver.cores": "1",
    "spark.eventLog.enabled": "false",
    "spark.app.name": "Spark REST API - PI",
    "spark.submit.deployMode": "cluster",
    "spark.jars" : "file:///opt/bitnami/spark/apps/dwh-plumber-1.0.jar",
    "spark.driver.supervise": "true"
  },
  "clientSparkVersion": "3.2.0",
  "mainClass": "App",
  "environmentVariables": {
    "SPARK_ENV_LOADED": "1"
  },
  "action": "CreateSubmissionRequest",
  "appArgs": [ "" ]
}'

Let me know if this works for you or not, if it doesn't work I will remove it because I am not sure anymore if this was the only thing we had to do to get it to work.

@LasseJacobs
Copy link

Also, the rest API has to be enabled. Here is an example config:

apiVersion: v1
kind: ConfigMap
metadata:
  name: spark-master-config
data:
  spark-defaults.conf: |
    spark.master.rest.enabled true
    spark.driver.host spark-master
    spark.driver.port 7077

and then mount the volume:

 volumeMounts:
            - name: config-volume
              mountPath: /opt/bitnami/spark/conf/spark-defaults.conf
              subPath: spark-defaults.conf

@isbn390
Copy link

isbn390 commented Oct 22, 2024

@LasseJacobs Hi, how can we update the values.yaml to do so. I can't get it right, since the current empty dir volumemount is conflicting.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment