Skip to content

Instantly share code, notes, and snippets.

@falkerl
Last active Mar 9, 2021
Embed
What would you like to do?
Vaccine combinations
val df = spark.read.option("header", true)
.csv("/Users/elena/Downloads/vaccine_combinations.csv")
df.createTempView("data")
val diseases = df.columns.filter(_ != "ID")
diseases.map(d => df.where(col(d) === lit(1)).select(col("ID"), lit(d).as("disease")))
.reduce(_ union _)
.createTempView("vac2dis")
spark.sql(
"""select count(*)
|from data as v1
|join data as v2 on v1.ID < v2.ID
|where not exists (
| select 1
| from vac2dis d1
| join vac2dis d2
| on d1.disease = d2.disease
| where d1.ID = v1.ID and d2.ID = v2.ID
|)
|""".stripMargin
).show()
@falkerl
Copy link
Author

falkerl commented Mar 9, 2021

The dataset can be found here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment