Skip to content

Instantly share code, notes, and snippets.

@pavlov99
Last active October 3, 2016 02:55
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save pavlov99/511db6021622856f0fd7156751a49631 to your computer and use it in GitHub Desktop.
Save pavlov99/511db6021622856f0fd7156751a49631 to your computer and use it in GitHub Desktop.
Apache Spark: boolean operations with null handling
sc.parallelize(Array[(Int, Option[Boolean])](
(0, Some(true)), (1, Some(false)), (3, null)
)).toDF("id", "column")
.withColumn("notColumn", !$"column")
.withColumn("andNull", $"column" && null)
.withColumn("orNull", $"column" || null)
.withColumn("andFalse", $"column" && false)
.withColumn("orFalse", $"column" || false)
.withColumn("andTrue", $"column" && true)
.withColumn("orTrue", $"column" || true)
.show()
+---+------+---------+-------+------+--------+-------+-------+------+
| id|column|notColumn|andNull|orNull|andFalse|orFalse|andTrue|orTrue|
+---+------+---------+-------+------+--------+-------+-------+------+
| 0| true| false| null| true| false| true| true| true|
| 1| false| true| false| null| false| false| false| true|
| 3| null| null| null| null| false| null| null| true|
+---+------+---------+-------+------+--------+-------+-------+------+
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment