Spark console
git co https://github.com/gaborgsomogyi/spark.git
cd spark
sbuild -d
sbin/start-master.sh
sbin/start-worker.sh spark://gsomogyi-MBP16:7077
Check that http://localhost:8080/
is open and the worker is available.
Spark App console
git co https://github.com/gaborgsomogyi/spark-test.git
cd spark-test
Written the following streaming app
spark.conf.set("spark.sql.shuffle.partitions", "1")
val input = spark.readStream
.format("rate")
.option("numPartitions", "1")
.option("rowsPerSecond", "5")
.load()
.select(col("timestamp"), col("value").mod(5).as("value"))
.withWatermark("timestamp", "10 minutes")
.groupBy("value").count()
val query = input.toDF().writeStream.format("noop").outputMode(Complete).start()
query.awaitTermination()
Compile it
mvn clean package
Go back to Spark console
bin/spark-submit --master spark://gsomogyi-MBP16:7077 --deploy-mode client --class com.spark.Main ../spark-test/target/spark-test-1.0-SNAPSHOT-jar-with-dependencies.jar