Skip to content

Instantly share code, notes, and snippets.

@RajaShyam
Last active November 24, 2018 08:36
Show Gist options
  • Save RajaShyam/1c3ae289b5cb00160081e0008677d1c2 to your computer and use it in GitHub Desktop.
Save RajaShyam/1c3ae289b5cb00160081e0008677d1c2 to your computer and use it in GitHub Desktop.
Spark Streaming
Dropwizard metrics:
==================
1. Push metrics into Ganglia, Graphite etc..(Can be enabed using SQL configuration)
spark.conf.set("spark.sql.streaming.metricsEnabled","true")
2. Enable INFO or DEBUG logging levels for org.apache.spark.sql.kafka010.KafkaSource to see what happens inside.
Add the following line to conf/log4j.properties:
log4j.logger.org.apache.spark.sql.kafka010.KafkaSource=DEBUG
Techniques:
==========
Source: https://databricks.com/session/apache-spark-streaming-programming-techniques-you-should-know
- Self contained stream generation
- Refreshing external data
- Structured streaming capability
- keeping arbitary state
- Probabilistic accumulators
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment