Skip to content

Instantly share code, notes, and snippets.

@falkerl
Created March 22, 2021 11:52
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save falkerl/f62fd4d4b60eb2a36ed1e9073d9b17db to your computer and use it in GitHub Desktop.
Save falkerl/f62fd4d4b60eb2a36ed1e9073d9b17db to your computer and use it in GitHub Desktop.
EEA & UK deaths: DIC and CVST
test("EEA_UK_deaths") {
val df = spark.read.option("header", true)
.csv("/Users/elena/Downloads/Morticd10_part*")
df.createTempView("data")
val code2country = spark.read.option("header", true)
.csv("/Users/elena/Downloads/WHO_country_codes.csv")
val filteredCounties = spark.read.option("header", true).option("sep", ";")
.csv("/Users/elena/Downloads/EEA_UK_country_list.csv")
code2country.createTempView("code")
filteredCounties.createTempView("filtered")
val res = spark.sql(
"""select d.year, d.cause, c.name, count(*)
|from data d
|join code c on c.country = d.country
|where c.name in (select f.name from filtered f)
| and c.cause in ('D65', 'I676')
|group by d.year, d.cause, c.name
|""".stripMargin)
res.coalesce(1).write.option("header", true).csv("/Users/elena/Downloads/output")
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment