Skip to content

Instantly share code, notes, and snippets.

@kovid-r
Last active October 11, 2022 04:49
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 2 You must be signed in to fork a gist
  • Save kovid-r/75f39e6ec59358f5a45e27da9e1d0bb4 to your computer and use it in GitHub Desktop.
Save kovid-r/75f39e6ec59358f5a45e27da9e1d0bb4 to your computer and use it in GitHub Desktop.
PySpark Cheat Sheet Application Initialization
import pyspark
from pyspark import SparkContext
from pyspark.sql import SparkSession
from pyspark.sql import SQLContext
# create a SparkSession instance with the name moviedb with Hive support enabled
# https://spark.apache.org/docs/latest/sql-data-sources-hive-tables.html
spark = SparkSession.builder.appName("moviedb").enableHiveSupport().getOrCreate()
# create a SparkContext instance which allows the Spark Application to access
# Spark Cluster with the help of a resource manager which is usually YARN or Mesos
sc = SparkContext.getOrCreate()
# create a SQLContext instance to access the SQL query engine built on top of Spark
sqlContext = SQLContext(spark)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment