Skip to content

Instantly share code, notes, and snippets.

@rainsunny
Last active July 5, 2018 06:17
Show Gist options
  • Save rainsunny/5d4cdb28a9a5218ea7c46bae3e92ee7d to your computer and use it in GitHub Desktop.
Save rainsunny/5d4cdb28a9a5218ea7c46bae3e92ee7d to your computer and use it in GitHub Desktop.
Pivot function: turn DataFrame column into rows

Spark DataFrame pivot functions

Turning column values into rows

Before

ntfLog.groupby("auth_method","auth_result").agg(F.count("*").alias("cnt"))
.sort("auth_method","auth_result").show(20,False)

Result:

+------------------+-----------+------+
|auth_method       |auth_result|cnt   |
+------------------+-----------+------+
|FACE_RECOGNITION  |false      |41528 |
|FACE_RECOGNITION  |true       |154838|
|NCIIC             |true       |35420 |
|QUICKPAY_SIGN     |false      |15382 |
|QUICKPAY_SIGN     |true       |156307|
|SHORT_PAY_PASSWORD|false      |28698 |
|SHORT_PAY_PASSWORD|true       |157004|
+------------------+-----------+------+

After Using Pivot

(ntfLog.groupby("auth_method")
 .pivot("auth_result", ['false','true'])
 .agg(F.count("*"))
 .sort("auth_method")
 .show(20,False)
 )

Result:

+------------------+-----+------+
|auth_method       |false|true  |
+------------------+-----+------+
|FACE_RECOGNITION  |41528|154838|
|NCIIC             |null |35420 |
|QUICKPAY_SIGN     |15382|156307|
|SHORT_PAY_PASSWORD|28698|157004|
+------------------+-----+------+

Reference

https://databricks.com/blog/2016/02/09/reshaping-data-with-pivot-in-apache-spark.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment