Skip to content

Instantly share code, notes, and snippets.

@lindacmsheard
Created July 7, 2021 18:41
Show Gist options
  • Save lindacmsheard/aae4548db888fc192002b0ff05844f99 to your computer and use it in GitHub Desktop.
Save lindacmsheard/aae4548db888fc192002b0ff05844f99 to your computer and use it in GitHub Desktop.
Create a pyspark dataframe of filepaths by reading a directory in databricks
#create a dataframe from filepaths

directory = '/path/on/dbfs'

file_paths = dbutils.fs.ls(directory)

#e.g
print(file_paths[0].path)
print(file_paths[0].name)

files_df = spark.createDataFrame(map(lambda path: (path.path,path.name), file_paths), ["path","name"])
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment