Skip to content

Instantly share code, notes, and snippets.

@vietvudanh
Created March 23, 2020 02:36
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save vietvudanh/d49a73301b76a62e6612255c41e5bca4 to your computer and use it in GitHub Desktop.
Save vietvudanh/d49a73301b76a62e6612255c41e5bca4 to your computer and use it in GitHub Desktop.
pyspark
from pyspark.sql import SparkSession
# config
INPUT = ""
OUTPUT = ""
def main():
spark = ( SparkSession.builder
.appName("Viet PySpark")
.config("spark.dynamicAllocation.enabled","true")
.config("spark.dynamicAllocation.maxExecutors","16")
.config("spark.dynamicAllocation.minExecutors","1")
.config("spark.shuffle.service.enabled","true")
.config('spark.sql.warehouse.dir', '/apps/hive/warehouse')
.config('spark.sql.catalogImplementation', 'hive')
.config("spark.port.maxRetries","32")
.getOrCreate() )
# job
#
spark.sparkContext().stop()
spark.stop()
if __name__ == '__main__':
main()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment