Created
October 15, 2019 12:00
-
-
Save lakshay-arora/5eb852f95a0d1f8dc39f506166f37638 to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# create a RDD of the text file with Number of Partitions = 4 | |
my_text_file = sc.textFile('tokens_spark.txt',minPartitions=4) | |
# RDD Object | |
print(my_text_file) | |
# convert to lower case | |
my_text_file = my_text_file.map(lambda x : x.lower()) | |
# Updated RDD Object | |
print(my_text_file) | |
# Get the RDD Lineage | |
print(my_text_file.toDebugString()) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Really Nice. Please what is the content of your 'tokens_spark.txt' file?
Thanks in advance.
I used the below to create a file just for those who may need it: