Skip to content

Instantly share code, notes, and snippets.

What would you like to do?
# parallelizing data collection
my_list = [1, 2, 3, 4, 5]
my_list_rdd = sc.parallelize(my_list)
## 2. Referencing to external data file
file_rdd = sc.textFile("path_of_file")
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment