Skip to content

Instantly share code, notes, and snippets.

@fyyying
Last active June 28, 2020 20:14
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save fyyying/7749f5e7efb1c8ac5335649bd3d40665 to your computer and use it in GitHub Desktop.
Save fyyying/7749f5e7efb1c8ac5335649bd3d40665 to your computer and use it in GitHub Desktop.
path = "https://gist.githubusercontent.com/fyyying/4aa5b471860321d7b47fd881898162b7/raw/e8606de9a82e13ca6215b340ce260dad60469cba/titanic_dataset.csv"
# read in the csv file
df = spark.read.format('csv').load(SparkFiles.get("titanic_dataset.csv"), header=True, inferSchema=True)
# One can read in data from csv/partquet/json... if the path is linked to a parquet or json file
df = spark.read.format('json').load(SparkFiles.get("titanic_dataset.json"), header=True, inferSchema=True)
df = spark.read.format('parquet').load(SparkFiles.get("titanic_dataset.parquet"), header=True, inferSchema=True)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment