Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save gupta-himanshu/a6c31e0e6c28d8705debaef38b2c215c to your computer and use it in GitHub Desktop.
Save gupta-himanshu/a6c31e0e6c28d8705debaef38b2c215c to your computer and use it in GitHub Desktop.
import org.apache.spark.sql.SparkSession
object StructuredNetworkWordCount extends App {
val spark = SparkSession
.builder
.appName("StructuredNetworkWordCount")
.master("local")
.config("spark.sql.shuffle.partitions", 8)
.getOrCreate()
import spark.implicits._
// Create DataFrame representing the stream of input lines from connection to localhost:9999
val lines = spark.readStream
.format("socket")
.option("host", "localhost")
.option("port", 9999)
.load()
// Split the lines into words
val words = lines.as[String].flatMap(_.split(" "))
// Generate running word count
val wordCounts = words.groupBy("value").count()
// Start running the query that prints the running counts to the console
val query = wordCounts.writeStream
.outputMode("complete")
.format("console")
.start()
query.awaitTermination()
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment