Last active
July 10, 2018 04:07
-
-
Save SuvroBaner/8db0be1d85d8607974e8bbfe813cabf2 to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Creating Spark Configuration and Spark Context- | |
from pyspark import SparkContext, SparkConf | |
conf = SparkConf().setAppName("My Dataframe") | |
sc = SparkContext(conf = conf) | |
from pyspark.sql import SparkSession # To work with dataframe we need pyspark.sql | |
spark = SparkSession(sc) # passing Spark Context to SQL module | |
myRange = spark.range(1000).toDF("number") | |
# myRange is a Spark DataFrame with one column containing 1,000 rows with values from 0 to 999. | |
# When run on a cluster, each part of this range of numbers exists on a different executor. | |
# Let's perform a transformation- | |
divisBy2 = myRange.where("number % 2 = 0") # `where` is an alias for :func:`filter`. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment