Skip to content

Instantly share code, notes, and snippets.

@hoholee12
Created May 31, 2020 12:13
Show Gist options
  • Save hoholee12/ade4dd826ea1d46473967330e0022d87 to your computer and use it in GitHub Desktop.
Save hoholee12/ade4dd826ea1d46473967330e0022d87 to your computer and use it in GitHub Desktop.
import spark.implicits._
import org.apache.spark.sql.types._
val logDF = spark.read.textFile("/sparkdata/logcat/test.txt").map(
_.replace('-', ' ').replace(':', ' ').trim.split("\\s+", 10)).map{
case Array(mm, dd, hr, mn, sc, pid, tid, priority, tag, message) =>
(mm, dd, hr, mn, sc, pid, tid, priority, tag, message)
}.toDF(
"mm", "dd", "hr", "mn", "sc", "pid", "tid", "priority", "tag", "message")
case class logParse(mm:String, dd:String, hr:String, mn:String, sc:String,
pid:String, tid:String, priority:String, tag:String, message:String)
val logDS = logDF.as[logParse]
logDS.printSchema()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment