Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save ari-vedant-jain/bd74d0ba7c350dd348af1f92eadd0e76 to your computer and use it in GitHub Desktop.
Save ari-vedant-jain/bd74d0ba7c350dd348af1f92eadd0e76 to your computer and use it in GitHub Desktop.
# Using Python
import os, zipfile
z = zipfile.ZipFile('/databricks/driver/D-Dfiles.zip')
for f in z.namelist():
if f.endswith('/'):
os.makedirs(f)
# Reading zipped folder data in Pyspark
import zipfile
import io
def zip_extract(x):
in_memory_data = io.BytesIO(x[1])
file_obj = zipfile.ZipFile(in_memory_data, "r")
files = [i for i in file_obj.namelist()]
return dict(zip(files, [file_obj.open(file).read() for file in files]))
zips = sc.binaryFiles("dbfs:/mnt/vedant-demo/ONG/data/las_raw/D-Dfiles.zip")
files_data = zips.map(zip_extract)
@priyankabarua88
Copy link

Thanks, can the same command be used to unzip a rar file?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment