Skip to content

Instantly share code, notes, and snippets.

@kovid-r
Last active October 11, 2022 04:49
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save kovid-r/5c65d7fb692d40ccc2dda7c6d9724409 to your computer and use it in GitHub Desktop.
Save kovid-r/5c65d7fb692d40ccc2dda7c6d9724409 to your computer and use it in GitHub Desktop.
Reading Different File Formats PySpark Cheatsheet
# Reading a csv file - all of these methods work the same for all the different formats
df = spark.read.csv(csv_file_path)
df = spark.read.format('csv').options(header=True,inferSchema=True).load(csv_file_path)
df = spark.read.format('csv').options(header='True',inferSchema='True').load(csv_file_path)
df = spark.read.format('CSV').options(header='true',inferSchema='true').load(csv_file_path)
df = spark.read.csv(file_path, header=True)
df = spark.read.csv(file_path, header='true')
# Reading a json file
df = spark.read.json(json_file_path)
# Reading a text file
df = spark.read.text(text_file_path)
# Reading a parquet file
df = spark.read.load(parquet_file_path) # or
df = spark.read.parquet(parquet_file_path)
# Reading a delta lake file
df = spark.read.format("delta").load(delta_lake_file_path)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment