Skip to content

Instantly share code, notes, and snippets.

@AleksandrPavlenko
Created December 30, 2016 13:59
Show Gist options
  • Save AleksandrPavlenko/bef1c34458883730cc319b2e7378c8c6 to your computer and use it in GitHub Desktop.
Save AleksandrPavlenko/bef1c34458883730cc319b2e7378c8c6 to your computer and use it in GitHub Desktop.
Spark 2.0 does not support None value in Row
import org.apache.spark.sql._
import org.apache.spark.sql.types._
val sqlContext = new SQLContext(sc)
val schema = StructType(List(
StructField(name = "surrogate_key", dataType = LongType),
StructField(name = "family", dataType = StringType),
StructField(name = "time", dataType = LongType),
StructField(name = "user_id", dataType = StringType),
StructField(name = "temperature_k", dataType = DoubleType)))
// Works fine with null value in Row
val rowWithNull = sc.makeRDD(Seq(Row(2L, null, 111222L, "test", 123.123)))
val dfWithNull = sqlContext.createDataFrame(rowWithNull, schema)
dfWithNull.collect()
// Fails with None value in Row
val rowWithNone = sc.makeRDD(Seq(Row(2L, None, 111222L, "test", 123.123)))
val dfWithNone = sqlContext.createDataFrame(rowWithNone, schema)
dfWithNone.collect()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment