Skip to content

Instantly share code, notes, and snippets.

@chumo
chumo / parallel_groupby_apply.py
Created June 17, 2016 13:14
Parallelize apply after pandas groupby using PySpark
import pandas as pd
# Spark context
import pyspark
sc = pyspark.SparkContext()
# apply parallel
def applyParallel(dfGrouped, func):
# rdd with the group of dataframes