Skip to content

Instantly share code, notes, and snippets.

@christopheblp
Created May 1, 2019 13:24
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save christopheblp/bb6f5de3e8cd910bff74e3aefdfef0fe to your computer and use it in GitHub Desktop.
Save christopheblp/bb6f5de3e8cd910bff74e3aefdfef0fe to your computer and use it in GitHub Desktop.
OrderBy without Spark dataset abstraction
import org.apache.spark.sql.SparkSession
import org.scalatest.FunSuite
case class Root(headers: Map[String, String], body: String)
class OrderByTimeStampTest extends FunSuite {
val spark = SparkSession.builder
.master("local[*]")
.getOrCreate
test("fill and order dataset") {
val r1 = Root(Map("test" -> "a", "timestamp" -> "1"), "FirstTest")
val r2 = Root(Map("test" -> "b", "timestamp" -> "3"), "Test")
val r3 = Root(Map("test" -> "c", "timestamp" -> "4"), "Test")
val r4 = Root(Map("test" -> "d", "timestamp" -> "5"), "Test")
val r5 = Root(Map("test" -> "e", "timestamp" -> "2"), "Test")
import spark.implicits._
//Just to show an example, I don't use spark to read the files and store them in a dataset
//Assume that ds is the dataset I retrieve by reading the files from HDFS
val ds : Dataset[Root] = Seq(r1, r2, r3, r4, r5).toDS
//SylarBenes solution
val seq: Seq[Root] = ds.collect().to[Seq]
seq.sortBy(x => x.headers.get("timestamp").map(_.toInt)).reverse.foreach(println(_))
}
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment