Skip to content

Instantly share code, notes, and snippets.

@yuanzhaoYZ
Last active July 9, 2017 02:44
Show Gist options
  • Save yuanzhaoYZ/1827db3302273a158fc087e3ccccf373 to your computer and use it in GitHub Desktop.
Save yuanzhaoYZ/1827db3302273a158fc087e3ccccf373 to your computer and use it in GitHub Desktop.

Pyspark

spark-submit

spark-submit --master yarn --deploy-mode cluster --name pyspark_job --driver-memory 2G --driver-cores 2 --executor-memory 12G --executor-cores 5 --num-executors 10 --conf spark.yarn.executor.memoryOverhead=4096 --conf spark.task.maxFailures=36 --conf spark.driver.maxResultSize=0 --conf spark.network.timeout=800s --conf spark.scheduler.listenerbus.eventqueue.size=500000 --conf spark.speculation=true --py-files lib.zip,lib1.zip,lib2.zip spark_test.py

spark_test.py

import pyspark
import sys
from pyspark.sql import SQLContext

sc = pyspark.SparkContext()
sc.addPyFile('lib.zip')
sc.addPyFile('lib1.zip')
sc.addPyFile('lib2.zip')

from lib import XX
from lib2 import XX2
from lib3 import XX3

....


Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment