Skip to content

Instantly share code, notes, and snippets.

@girisandeep
Last active December 9, 2020 02:54
Show Gist options
  • Star 2 You must be signed in to star a gist
  • Fork 5 You must be signed in to fork a gist
  • Save girisandeep/f90e456da6f2381f9c86e8e6bc4e8260 to your computer and use it in GitHub Desktop.
Save girisandeep/f90e456da6f2381f9c86e8e6bc4e8260 to your computer and use it in GitHub Desktop.
An example of creating a custom partitioners in spark using scala
import org.apache.spark.Partitioner
class TwoPartsPartitioner(override val numPartitions: Int) extends Partitioner {
def getPartition(key: Any): Int = key match {
case s: String => {
if (s(0).toUpper > 'J') 1 else 0
}
}
}
var x = sc.parallelize(Array(("sandeep",1),("giri",1),("abhishek",1),("sravani",1),("jude",1)), 3)
x.glom().collect()
//Array(Array((sandeep,1)), Array((giri,1), (abhishek,1)), Array((sravani,1), (jude,1)))
//[ [(sandeep,1)], [(giri,1), (abhishek,1)], [(sravani,1), (jude,1)] ]
var y = x.partitionBy(new TwoPartsPartitioner(2))
y.glom().collect()
//Array(Array((giri,1), (abhishek,1), (jude,1)), Array((sandeep,1), (sravani,1)))
//[ [(giri,1), (abhishek,1), (jude,1)], [(sandeep,1), (sravani,1)] ]
@shatestest
Copy link

can you please show one , how to write a custom partitioner in spark-sql 2.3.x ?

@srikanthgr1
Copy link

import org.apache.spark.Partitioner
class TwoPartsPartitioner(override val numPartitions: Int) extends Partitioner { def getPartition(key: Any): Int = key match { case s: String => {if (s(0).toUpper > 'J') 1 else 0 } }
override def equals(other: Any): Boolean = other.isInstanceOf[TwoPartsPartitioner]
override def hashCode: Int = 0
}

var x = sc.parallelize(Array(("sandeep",1),("giri",1),("abhishek",1),("sravani",1),("jude",1)), 3)
x.glom().collect()

//Array(Array((sandeep,1)), Array((giri,1), (abhishek,1)), Array((sravani,1), (jude,1)))
//[ [(sandeep,1)], [(giri,1), (abhishek,1)], [(sravani,1), (jude,1)] ]

var y = x.partitionBy(new TwoPartsPartitioner(2))
y.glom().collect()
//Array(Array((giri,1), (abhishek,1), (jude,1)), Array((sandeep,1), (sravani,1)))
//[ [(giri,1), (abhishek,1), (jude,1)], [(sandeep,1), (sravani,1)] ]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment