Skip to content

Instantly share code, notes, and snippets.

@gaborgsomogyi
Last active September 23, 2020 10:52
Show Gist options
  • Save gaborgsomogyi/63538f7597fc1ff6ae2e694ec1ff9e78 to your computer and use it in GitHub Desktop.
Save gaborgsomogyi/63538f7597fc1ff6ae2e694ec1ff9e78 to your computer and use it in GitHub Desktop.

Commands

Spark console

git co https://github.com/gaborgsomogyi/spark.git
cd spark
sbuild -d
sbin/start-master.sh
sbin/start-worker.sh spark://gsomogyi-MBP16:7077

Check that http://localhost:8080/ is open and the worker is available.

Spark App console

git co https://github.com/gaborgsomogyi/spark-test.git
cd spark-test

Written the following streaming app

    spark.conf.set("spark.sql.shuffle.partitions", "1")
    val input = spark.readStream
      .format("rate")
      .option("numPartitions", "1")
      .option("rowsPerSecond", "5")
      .load()
      .select(col("timestamp"), col("value").mod(5).as("value"))
      .withWatermark("timestamp", "10 minutes")
      .groupBy("value").count()
    val query = input.toDF().writeStream.format("noop").outputMode(Complete).start()
    query.awaitTermination()

Compile it

mvn clean package

Go back to Spark console

bin/spark-submit --master spark://gsomogyi-MBP16:7077 --deploy-mode client --class com.spark.Main ../spark-test/target/spark-test-1.0-SNAPSHOT-jar-with-dependencies.jar
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment