Skip to content

Instantly share code, notes, and snippets.

@IgorBerman
Last active August 29, 2015 14:14
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save IgorBerman/b9b246fa110598e722ec to your computer and use it in GitHub Desktop.
Save IgorBerman/b9b246fa110598e722ec to your computer and use it in GitHub Desktop.
spark-twitter-streaming-windows
download twitter core, stream and spark-streaming-twitter from maven central and place them into spark/lib
run from bin directory:
spark-shell --jars ../lib/twitter4j-core-3.0.3.jar,../lib/twitter4j-stream-3.0.3.jar,../lib/spark-streaming-twitter_2.10-1.2.0.jar
--------------------------------------------------------------
import org.apache.spark.streaming.{Seconds, StreamingContext}
import org.apache.spark.streaming.twitter._
import java.io.FileWriter
val filters = List("postgres", "timeseries")
val ssc = new StreamingContext(sc, Seconds(10))
val stream = TwitterUtils.createStream(ssc, None, filters)
val statuses = stream.map(status => status.getText())
val fw = new FileWriter("C:/dev/tools/spark-1.2.0-bin-hadoop2.4/tweets/tweets.txt", true)
statuses.foreachRDD{rdd=>
fw.write( rdd.collect().mkString("\n\n"))
}
ssc.start()
ssc.stop()
on windows:
download winutils.exe into bin http://public-repo-1.hortonworks.com/hdp-win-alpha/winutils.exe
add this to java arguments spark-class2.cmd -Dhadoop.home.dir=C:\dev\tools\spark-1.2.0-bin-hadoop2.4 to
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment