Skip to content

Instantly share code, notes, and snippets.



Last active Jan 19, 2021
What would you like to do?
// create the Spark configuration and context
SparkConf conf = new SparkConf().setAppName("Wordcount").setMaster("local[*]");
JavaSparkContext sc = new JavaSparkContext(conf);
// load data and create an RDD of string
JavaRDD<String> tweets = sc.textFile("path_To_File")
JavaPairRDD<String, Integer> wordcount = tweets.flatMap(line -> Arrays.asList(line.toString().split(" ")))
// mapper step
.mapToPair(word -> new Tuple2<>(word, 1))
// reducer step
.reduceByKey((x, y) -> x + y);
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment