Skip to content

Instantly share code, notes, and snippets.

@BrunoGomesCoelho
Created July 21, 2019 22:06
Show Gist options
  • Star 3 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save BrunoGomesCoelho/3ef3166fbdcded04cb890c0b31e36d22 to your computer and use it in GitHub Desktop.
Save BrunoGomesCoelho/3ef3166fbdcded04cb890c0b31e36d22 to your computer and use it in GitHub Desktop.
Read and concat various pandas dataframes in parallel. All credits to @zemekeneng on stackoverflow
from multiprocessing import Pool # for reading the CSVs faster
def my_read_csv(filename):
# Helper function for the parellel load_csvs
return pd.read_csv(filename)
def load_csvs(prefix):
"""Reads and joins all our CSV files into one big dataframe.
We do it in parallel to make it faster, since otherwise it takes some time.
Idea from: https://stackoverflow.com/questions/36587211/easiest-way-to-read-csv-files-with-multiprocessing-in-pandas
"""
# set up your pool
pool = Pool()
file_list = [f"{DATA_PATH}/{prefix}{idx}.csv" for idx in range(1, 21)]
df_list = pool.map(my_read_csv, file_list)
# reduce the list of dataframes to a single dataframe
return pd.concat(df_list, ignore_index=True)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment