Skip to content

Instantly share code, notes, and snippets.

@onastavchuk
Created November 2, 2017 16:33
Show Gist options
  • Save onastavchuk/42bd7bdd9d368f74ac5676a03e3fbe91 to your computer and use it in GitHub Desktop.
Save onastavchuk/42bd7bdd9d368f74ac5676a03e3fbe91 to your computer and use it in GitHub Desktop.
fun castDf(df: Dataset<Row>) =
df.withColumn("_tmp", split(col("request"), " ")).select(
col("host"),
unix_timestamp(
col("request_time"), "dd/MMM/yyyy:HH:mm:ss"
).cast("timestamp").alias("time"),
col("_tmp").getItem(0).alias("verb"),
col("_tmp").getItem(1).alias("resource"),
col("status").cast("short"),
col("bytes")
).drop("_tmp")!!
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment