Skip to content

Instantly share code, notes, and snippets.

@siakon89
Created May 16, 2020 14:36
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save siakon89/01e61eb10db7309e8fca30d5a76b735b to your computer and use it in GitHub Desktop.
Save siakon89/01e61eb10db7309e8fca30d5a76b735b to your computer and use it in GitHub Desktop.
# Filesystem schemes and URIs
|=====================================|
| Filesystem | URI Structure |
|----------------|--------------------|
| Local Fs | file:///path |
| HDFS | hdfs://hdfs_path |
| S3 | s3://bucket/object |
|=====================================|
# Loading a file into an RDD
rdd = sc.textFile("file:///filename")
# Loading a directory into rdd
rdd = sc.textFile("file:///dir/")
# Loading a directory with miltiple files
# in a tuple form ('filename', 'contents')
rdd = sc.wholeTextFile("file:///dir/")
# Loading all the CSVs from a directory
rdd = sc.textFile("file:///dir/*.csv")
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment