Skip to content

Instantly share code, notes, and snippets.

@hasanisaeed
Last active October 7, 2021 04:31
Show Gist options
  • Save hasanisaeed/bef720188ebd2384748d234b58495c75 to your computer and use it in GitHub Desktop.
Save hasanisaeed/bef720188ebd2384748d234b58495c75 to your computer and use it in GitHub Desktop.
Split .csv files by rows.
def split_and_save_df(df, name='NoName', chunk_size=100, output_dir=''):
"""Split a dataframe and save each chunk in a different csv file.
Parameters:
df : data frame ( df = pd.read_csv('file.csv')
name : name of output file
chunk_size : chunk size
output_dir : directory where to write the divided dataframe
"""
import os
for i in range(0, df.shape[0], chunk_size):
start = i
end = min(i + chunk_size - 1, df.shape[0])
subset = df.loc[start:end]
output_path = os.path.join(output_dir, f"{name}_{start}_{end}.csv")
print(f">> Going to write into {output_path}")
subset.to_csv(output_path)
output_size = os.stat(output_path).st_size
print(f">> Wrote {output_size} bytes")
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment