Skip to content

Instantly share code, notes, and snippets.

@tonyfraser
tonyfraser / loadJsonDfFromWeb.scala
Last active July 25, 2019 16:39
Load json into a DF from a web url
// run from spark shell
// you'd probably never do this, but just in case you ever wanted to.
import scala.io.Source
import spark.implicits
val url = "https://raw.githubusercontent.com/sitepoint-editors/json-examples/master/src/db.json"
val json = Source.
fromURL(url).
import spark.implicits
val df = Seq(
("bravo", "southern charm", "b-sc-first-episode", true, false, 5, 10),
("bravo", "southern charm", "b-sc-second-episode", false, false, 11, 22),
("bravo", "vanderpump", "b-v-first-episode", true, false, 3, 6),
("bravo", "vanderpump", "b-v-second-episode", false,false, 4, 8),
("syfy", "krypton", "s-kr-first-episode", false, true, 2, 4),
("syfy", "below deck", "s-bd-first-episode", true, true, 1, 2)
).toDF("network_name", "show_name", "episode", "in_scope", "supported", "completes", "views").
// -------------------------------------------------------------
// spark/scala
// -------------------------------------------------------------
//union a dataframe and return the records that a are different
val diff = df.union(df2).except(df))
// -------------------------------------------------------------
//udf for turning empty dataframe cells into null dataframe cells
@tonyfraser
tonyfraser / minMaxValueOfAllIntColumns.scala
Last active March 11, 2019 20:25
Find the minimum and maximum values of a row within a scala spark dataframe
import org.apache.spark.sql.functions._
import org.apache.spark.sql.Row
import spark.implicits
import org.apache.spark.sql.functions.udf
// https://stackoverflow.com/questions/55083109/how-to-get-min-and-max-value-from-multiple-columns-in-a-dataframe-in-spark
// assumes the first column is an index and probably a string, but all other columns are integers.
val df = Seq(
("r0", 0, 2, 3),
@tonyfraser
tonyfraser / utcToNYTime.scala
Created February 7, 2019 17:17
Use DateTimeFormatter and ZoneId to convert from UTC to New York time.
import java.time.format.DateTimeFormatter
import java.time.LocalDateTime
import java.time.ZoneId
val ny = ZoneId.of("America/New_York")
val utc = ZoneId.of("UTC")
val dateTime = LocalDateTime.now.atZone(utc)
val nyTime = DateTimeFormatter.
ofPattern("yyyy-MMM-dd HH:mm z").
@tonyfraser
tonyfraser / Mail.scala
Created December 21, 2017 20:40 — forked from mariussoutier/Mail.scala
Sending mails fluently in Scala
package object mail {
implicit def stringToSeq(single: String): Seq[String] = Seq(single)
implicit def liftToOption[T](t: T): Option[T] = Some(t)
sealed abstract class MailType
case object Plain extends MailType
case object Rich extends MailType
case object MultiPart extends MailType