Skip to content

Instantly share code, notes, and snippets.

View tsaastam's full-sized avatar

Taneli Saastamoinen tsaastam

View GitHub Profile
@tsaastam
tsaastam / spark_21109.scala
Created July 1, 2017 20:46
Spark Dataset union & column order
// illustration of https://issues.apache.org/jira/browse/SPARK-21109
// see also https://lobotomys.blogspot.co.uk/2017/07/spark-union-column-order-issue.html
import org.apache.spark.rdd.RDD
import org.apache.spark.sql.Dataset
import org.apache.spark.sql.SQLContext
import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.functions._
// if using spark-shell, skip the next 4 lines
@tsaastam
tsaastam / spark_dataframe_ndcg.scala
Created August 13, 2016 19:16
Normalised Discounted Cumulative Gain (NDCG) for Spark DataFrames (with a UserDefinedAggregateFunction)
// Normalised Discounted Cumulative Gain (NDCG) for Spark DataFrames
// See e.g. https://en.wikipedia.org/wiki/Discounted_cumulative_gain
//
// To run this code in the Spark Shell:
//
// 1) https://spark.apache.org/ -> download a binary Spark distribution
// 2) ./bin/spark-shell
// 3) copy-paste!
import org.apache.spark.sql.expressions.UserDefinedAggregateFunction