Skip to content

Instantly share code, notes, and snippets.

@eavidan
Created October 26, 2017 11:03
Show Gist options
  • Save eavidan/47a990c7d29b2118df1a675ec84bb3d0 to your computer and use it in GitHub Desktop.
Save eavidan/47a990c7d29b2118df1a675ec84bb3d0 to your computer and use it in GitHub Desktop.
working on spark DF partitions. the following creates a Map of columns (key: column name, value: list of values) from the Rows supplied by the DF in each partition
val names = df.columns.toList
println(names)
df.foreachPartition(rows => {
var cols = scala.collection.mutable.Map[String, List[Any]]()
names.foreach(col => cols(col) = List())
rows.foreach(row => names.zip(row.toSeq).map(x => {
cols(x._1) = cols(x._1) :+ x._2
}))
println(cols)
})
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment