Skip to content

Instantly share code, notes, and snippets.

@marcusrehm
Last active September 25, 2020 15:16
Show Gist options
  • Save marcusrehm/01494c9321e0de852b67b88a7f6a92ff to your computer and use it in GitHub Desktop.
Save marcusrehm/01494c9321e0de852b67b88a7f6a92ff to your computer and use it in GitHub Desktop.
Function to join Spark DataFrames using Spark context to improve performance.
def unionAll(dfs: Seq[DataFrame]): DataFrame = {
val spark = SparkSession.builder().getOrCreate()
spark.sqlContext.createDataFrame(
spark.sparkContext.union(dfs.map(df => df.rdd)),
dfs.head.schema
)
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment