Skip to content

Instantly share code, notes, and snippets.

@garystafford
Last active September 29, 2019 14:33
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save garystafford/aeb3ee92e3abf9393c3a2896894a8adf to your computer and use it in GitHub Desktop.
Save garystafford/aeb3ee92e3abf9393c3a2896894a8adf to your computer and use it in GitHub Desktop.
#!/usr/bin/python
from pyspark.sql import SparkSession
from pyspark.sql.types import StructType, StructField, StringType, IntegerType
spark = SparkSession \
.builder \
.getOrCreate()
sc = spark.sparkContext
bakery_schema = StructType([
StructField('date', StringType(), True),
StructField('time', StringType(), True),
StructField('transaction', IntegerType(), True),
StructField('item', StringType(), True)
])
df3 = spark.read \
.format('csv') \
.option('header', 'true') \
.load('BreadBasket_DMS.csv', schema=bakery_schema)
df3.show(10)
df3.count()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment