gaborgsomogyi/Spark standalone streaming.md

## Spark standalone streaming.md

      
    Raw
  

              Spark standalone streaming.md
            
          
    Commands

Spark console
git co https://github.com/gaborgsomogyi/spark.git
cd spark
sbuild -d
sbin/start-master.sh
sbin/start-worker.sh spark://gsomogyi-MBP16:7077

Check that http://localhost:8080/ is open and the worker is available.
Spark App console
git co https://github.com/gaborgsomogyi/spark-test.git
cd spark-test

Written the following streaming app
    spark.conf.set("spark.sql.shuffle.partitions", "1")
    val input = spark.readStream
      .format("rate")
      .option("numPartitions", "1")
      .option("rowsPerSecond", "5")
      .load()
      .select(col("timestamp"), col("value").mod(5).as("value"))
      .withWatermark("timestamp", "10 minutes")
      .groupBy("value").count()
    val query = input.toDF().writeStream.format("noop").outputMode(Complete).start()
    query.awaitTermination()

Compile it
mvn clean package

Go back to Spark console
bin/spark-submit --master spark://gsomogyi-MBP16:7077 --deploy-mode client --class com.spark.Main ../spark-test/target/spark-test-1.0-SNAPSHOT-jar-with-dependencies.jar