Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
# importing required libraries
from pyspark.sql import SQLContext
from pyspark.sql import Row
# read the text data
raw_data = sc.textFile('sample_data_final_wh.txt').cache()
# get number of partitions
raw_data.getNumPartitions()
## >> 19
# view top 2 rows
raw_data.take(2)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment