Last active
March 9, 2021 19:54
-
-
Save falkerl/be03565e65d31026cf748ebc7338b5c2 to your computer and use it in GitHub Desktop.
Vaccine combinations
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
val df = spark.read.option("header", true) | |
.csv("/Users/elena/Downloads/vaccine_combinations.csv") | |
df.createTempView("data") | |
val diseases = df.columns.filter(_ != "ID") | |
diseases.map(d => df.where(col(d) === lit(1)).select(col("ID"), lit(d).as("disease"))) | |
.reduce(_ union _) | |
.createTempView("vac2dis") | |
spark.sql( | |
"""select count(*) | |
|from data as v1 | |
|join data as v2 on v1.ID < v2.ID | |
|where not exists ( | |
| select 1 | |
| from vac2dis d1 | |
| join vac2dis d2 | |
| on d1.disease = d2.disease | |
| where d1.ID = v1.ID and d2.ID = v2.ID | |
|) | |
|""".stripMargin | |
).show() |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
The dataset can be found here.