Skip to content

Instantly share code, notes, and snippets.

@qingniufly
Forked from timvw/sparkdemo.scala
Created August 23, 2017 04:29
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save qingniufly/371a2c25ea3cd1efd4d36a70f91d4e95 to your computer and use it in GitHub Desktop.
Save qingniufly/371a2c25ea3cd1efd4d36a70f91d4e95 to your computer and use it in GitHub Desktop.
SparkSQL and CTE for increased readability
val df = spark.read.text(inputFile)
df.createOrReplaceTempView("data")
val query =
"""
| WITH loglevel AS (SELECT SPLIT(value, ' ')[0] AS level FROM data WHERE LENGTH(value) > 0),
| levelcount AS (SELECT level, COUNT(*) as count FROM loglevel GROUP BY level)
| SELECT level, count FROM levelcount ORDER BY count DESC
""".stripMargin
val result = spark.sql(query)
println(result.rdd.toDebugString)
result.show()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment