Skip to content

Instantly share code, notes, and snippets.

@magic-lantern
Last active April 17, 2019 20:59
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save magic-lantern/d4e308726cd96b65820d0a395b7da4c2 to your computer and use it in GitHub Desktop.
Save magic-lantern/d4e308726cd96b65820d0a395b7da4c2 to your computer and use it in GitHub Desktop.
Small Python 3 script to show how to use multiprocessing for parallel processing of data
import pandas as pd
import numpy as np
import multiprocessing
from multiprocessing import Pool
num_processes = multiprocessing.cpu_count()
# on some systems, these next 2 lines will give better count for CPU intensive tasks
# import psutil
# num_processes = psutil.cpu_count(logical=False)
num_partitions = num_processes * 2 #smaller batches to get more frequent status updates
df = pd.DataFrame(np.random.randint(0,100,size=(100, 4)), columns=list('ABCD'))
# put your code to parallelize processing of partitions of df here
def process_df(my_df):
print("received df", my_df.shape)
with Pool(processes=num_processes) as pool:
df_split = np.array_split(df, num_partitions)
pool.map(process_df, df_split)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment