Skip to content

Instantly share code, notes, and snippets.

@falkerl
Last active Mar 9, 2021
Embed
What would you like to do?
Vaccine combinations
val df = spark.read.option("header", true)
.csv("/Users/elena/Downloads/vaccine_combinations.csv")
df.createTempView("data")
val diseases = df.columns.filter(_ != "ID")
diseases.map(d => df.where(col(d) === lit(1)).select(col("ID"), lit(d).as("disease")))
.reduce(_ union _)
.createTempView("vac2dis")
spark.sql(
"""select count(*)
|from data as v1
|join data as v2 on v1.ID < v2.ID
|where not exists (
| select 1
| from vac2dis d1
| join vac2dis d2
| on d1.disease = d2.disease
| where d1.ID = v1.ID and d2.ID = v2.ID
|)
|""".stripMargin
).show()
@falkerl

This comment has been minimized.

Copy link
Owner Author

@falkerl falkerl commented Mar 9, 2021

The dataset can be found here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment