Skip to content

Instantly share code, notes, and snippets.

@nivdul
Last active January 19, 2021 15:03
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save nivdul/0b84c5184ae42278b02f to your computer and use it in GitHub Desktop.
Save nivdul/0b84c5184ae42278b02f to your computer and use it in GitHub Desktop.
wordcount
// create the Spark configuration and context
SparkConf conf = new SparkConf().setAppName("Wordcount").setMaster("local[*]");
JavaSparkContext sc = new JavaSparkContext(conf);
// load data and create an RDD of string
JavaRDD<String> tweets = sc.textFile("path_To_File")
JavaPairRDD<String, Integer> wordcount = tweets.flatMap(line -> Arrays.asList(line.toString().split(" ")))
// mapper step
.mapToPair(word -> new Tuple2<>(word, 1))
// reducer step
.reduceByKey((x, y) -> x + y);
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment