Skip to content

Instantly share code, notes, and snippets.

@meg-codes
Created April 19, 2019 18:46
Show Gist options
  • Save meg-codes/cd53df7ec352fde2115a05ccf8fcc9d3 to your computer and use it in GitHub Desktop.
Save meg-codes/cd53df7ec352fde2115a05ccf8fcc9d3 to your computer and use it in GitHub Desktop.
PPA Collection UpsetR plot
library(tidyr)
library(dplyr)
library(UpSetR)
# read in the data
ppa <- read.csv("ppa-digitizedworks-20190419T18_24_42.csv")
# subset to only needed rows and split Collection into multiple rows
new_df <- ppa %>% select("Title", "Source.ID", "Collection") %>% separate_rows("Collection", sep=";")
# give a truth column value to map on spread
new_df$truthy <- 1
# spread and add 1 for an existing value, fill 0 otherwise, and then remove unneeded column V1
spread_out <- new_df %>% spread(Collection, truthy, fill=0) %>% select(-c("V1"))
# group by source.id and title, then aggregate using sum
grouped <- spread_out %>% group_by(Source.ID, Title) %>% summarize_all(sum)
upset(as.data.frame(grouped), text.scale = 2, line.size = 2, point.size=4, nsets=7)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment