Skip to content

Instantly share code, notes, and snippets.

@lakshay-arora
Last active January 13, 2020 04:03
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save lakshay-arora/8c0dd2b61bd1347589aee218fe7cf86c to your computer and use it in GitHub Desktop.
Save lakshay-arora/8c0dd2b61bd1347589aee218fe7cf86c to your computer and use it in GitHub Desktop.
# importing required libraries
from pyspark.sql import SQLContext
from pyspark.sql import Row
# read the text data
raw_data = sc.textFile('sample_data_final_wh.txt').cache()
# get number of partitions
raw_data.getNumPartitions()
## >> 19
# view top 2 rows
raw_data.take(2)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment