Skip to content

Instantly share code, notes, and snippets.

View karthikbgl's full-sized avatar

Karthik Ravindra karthikbgl

  • New York, NY
View GitHub Profile
@karthikbgl
karthikbgl / spark_dataframe_to_single_file_csv_using_hdfs.py
Created March 8, 2018 00:31
Saves a spark dataframe into a single csv/delimited file efficiently. Assumes the file storage to be hdfs
import subprocess
def write_to_local_fs(df):
"""
This method writes to local filesystem efficiently without using coalesce or repartition.
The idea is to persist data in cluster format in hdfs (or whatever file storage) and write to local file system.
Write header to the file in the local file system
:param: df: the dataframe being sent as argument
"""