Skip to content

Instantly share code, notes, and snippets.

@YordanGeorgiev
Last active November 30, 2018 10:18
Show Gist options
  • Save YordanGeorgiev/2e68bfb64004e4b7d5f910249f2f896a to your computer and use it in GitHub Desktop.
Save YordanGeorgiev/2e68bfb64004e4b7d5f910249f2f896a to your computer and use it in GitHub Desktop.
[iterate over rdd rows] how-to iterate over RDD rows and get DataFrame in scala spark #scala #spark
// note if you could implement withColumn + udf it has been usually over 10x faster ...
val rddRows: RDD[Row] =
inDf.rdd.map(row => {
val lstRow = row.toSeq.toList
var lstRowNew = lstRow
// do stuff on the new lstRow here
Row.fromSeq(lstRowNew)
})
val dfOut = spark.createDataFrame(rddRows, inDf.schema)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment