Skip to content

Instantly share code, notes, and snippets.

View hyperj's full-sized avatar

hyperj hyperj

View GitHub Profile
@MishaelRosenthal
MishaelRosenthal / GroupByKeySmallNumberOfGroups.scala
Last active May 8, 2020 09:59
RDD group by small number of groups
package core.sparkTest.utils
import java.io._
import java.nio.file.Files
import core.Pimps._
import org.apache.hadoop.io.compress.CompressionCodec
import org.apache.hadoop.io.{BytesWritable, NullWritable}
import org.apache.hadoop.mapred.JobConf
import org.apache.hadoop.mapred.lib.MultipleSequenceFileOutputFormat