Skip to content

Instantly share code, notes, and snippets.

@AfonsoTsukamoto
Last active September 25, 2015 17:39
Show Gist options
  • Save AfonsoTsukamoto/a879a97806026dbc94fb to your computer and use it in GitHub Desktop.
Save AfonsoTsukamoto/a879a97806026dbc94fb to your computer and use it in GitHub Desktop.
A function for the min definition necessary for parallel operations with python's multiprocessing
import multiprocessing
# Parallel
# So, it goes like this:
# Given the number of 'to process' items,
# Check how the distribution for cores will be (eg: 100 items on 2 cores = 50)
# Then, split items in *number of cores* collections and spread them aroung a
# subprocess pool
# To split the set, we create the interval [0, number_of_items] and make it
# go up the original items collection *number of cores* times
# [Return] tuple with : 'map appliable' pool of subprocesses
# 'splitted' dataset
def split_dataset(data, set_size, cores):
splitted_dataset = []
for i in range(0, cores):
splitted_dataset.append(data[i*set_size:(i+1)*set_size])
return splitted_dataset
def parallelize(data):
number_of_cores = multiprocessing.cpu_count()
set_size = len(data) / number_of_cores
splitted = split_dataset(data, set_size, number_of_cores)
pool = multiprocessing.Pool(number_of_cores)
return (pool, splitted)
# For testes
import time
import random
def func(numb):
print "processing %s" % numb
val = random.randint(1, numb[0] + 2)
time.sleep(int(val))
print "ended processing %s" % numb
data = list(range(0,100))
pool, dataset = parallelize(data)
pool.map(func, dataset)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment