Skip to content

Instantly share code, notes, and snippets.

@bdnf
Created November 30, 2020 16:26
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save bdnf/4c81d887fdfca3a5f1845d3a120517b1 to your computer and use it in GitHub Desktop.
Save bdnf/4c81d887fdfca3a5f1845d3a120517b1 to your computer and use it in GitHub Desktop.
Word count from socket example using Spark in Python
from pyspark.sql.functions import *
spark = (SparkSession
.builder
.appName("PythonMnMCount")
.getOrCreate())
lines = (spark
.readStream.format("socket")
.option("host", "localhost")
.option("port", 9999)
.load())
words = lines.select(split(col("value"), "\\s").alias("word"))
counts = words.groupBy("word").count()
checkpointDir = "..."
streamingQuery = (counts
.writeStream
.format("console")
.outputMode("complete")
.trigger(processingTime="1 second")
.option("checkpointLocation", checkpointDir)
.start())
streamingQuery.awaitTermination()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment