Skip to content

Instantly share code, notes, and snippets.

View ffan07039's full-sized avatar

Feng Fan ffan07039

  • Integral Ad Science
View GitHub Profile
@ffan07039
ffan07039 / memcached_lookup_udf.scala
Created March 5, 2021 15:08
a spark sql UDF that looks up memcachd with a local cache
var localCache:scala.collection.mutable.Map[Int, Int] = null
// try to serialize null reference of memCachedClient to the executors
// for now we don't know how to close this memCachedClient inside executors
var memCachedClient:Any = null
// method runs in executors to lookup memcache, it maintains a local cache as well
def readPubEntity(id: Int): Int = {
if (localCache == null){
localCache = scala.collection.mutable.Map()
memCachedClient = new MemcachedClient(new InetSocketAddress(configEndpoint, clusterPort))
@ffan07039
ffan07039 / estimate_join_performance.scala
Last active March 5, 2021 17:46
estimate join performance by comparing with baseline operation
/*
* estimate join performance by comparing performance of join_operation() and baseline_operation()
*/
def baseline_operation(spark: SparkSession, input_path: String, output_path: String): Unit = {
val qlogDf = spark.read.option("sep", "\t").schema(Schemas.qlogSchema).csv(input_path)
val filteredDf = qlogDf.filter(col("lookupId") > 0)
filteredDf.write.format("csv").option("mode", "OVERWRITE").option("sep", "\t").option("path", output_path).save()
}
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.