Skip to content

Instantly share code, notes, and snippets.

@bdnf
Last active November 30, 2020 16:27
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save bdnf/af317a71b0017d08de692ed19823fc67 to your computer and use it in GitHub Desktop.
Save bdnf/af317a71b0017d08de692ed19823fc67 to your computer and use it in GitHub Desktop.
Word count from socket using Spark in Scala
import org.apache.spark.sql.functions._
import org.apache.spark.sql.streaming._
val spark = SparkSession
.builder
.appName("MnMCount")
.getOrCreate()
val lines = spark
.readStream.format("socket")
.option("host", "localhost")
.option("port", 9999)
.load()
val words = lines.select(split(col("value"), "\\s").as("word"))
val counts = words.groupBy("word").count()
val checkpointDir = "..."
val streamingQuery = counts.writeStream
.format("console")
.outputMode("complete")
.trigger(Trigger.ProcessingTime("1 second"))
.option("checkpointLocation", checkpointDir)
.start()
streamingQuery.awaitTermination()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment