Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
# Filesystem schemes and URIs
|=====================================|
| Filesystem | URI Structure |
|----------------|--------------------|
| Local Fs | file:///path |
| HDFS | hdfs://hdfs_path |
| S3 | s3://bucket/object |
|=====================================|
# Loading a file into an RDD
rdd = sc.textFile("file:///filename")
# Loading a directory into rdd
rdd = sc.textFile("file:///dir/")
# Loading a directory with miltiple files
# in a tuple form ('filename', 'contents')
rdd = sc.wholeTextFile("file:///dir/")
# Loading all the CSVs from a directory
rdd = sc.textFile("file:///dir/*.csv")
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.