Skip to content

Instantly share code, notes, and snippets.

@danielecook
Last active March 14, 2018 06:55
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save danielecook/ed85b630615ea4c95f6d1062c31d3e89 to your computer and use it in GitHub Desktop.
Save danielecook/ed85b630615ea4c95f6d1062c31d3e89 to your computer and use it in GitHub Desktop.
rarefaction.R
library(tidyverse)
# bcftools query -f "[%GT\t]\n" WI.20170531.impute.vcf.gz | awk '{ gsub(":GT", "", $0); gsub("(# )?\[[0-9]+\]","",$0); print $0 }' | sed 's/0|0/0/g' | sed 's/1|1/1/g' | sed 's/0|1/NA/g' | sed 's/1|0/NA/g' | head -n 1000000 | gzip > ~/Desktop/impute_gts.test.tsv.gz
df_use <- tseries::read.matrix(gzfile("~/Desktop/impute_gts.tsv.gz"))
storage.mode(df_use) <- "logical"
# cols
results <- sapply(1:10, function(x) {
cumsum(
(tidyr::complete(
(plyr::count(
apply(
df_use[,sample(n_samples, n_samples)],
1,
function(x) {
min(which(x))
}
)
)
),
x = 1:249,
fill = list(freq = as.integer(0))
))$freq
)
})
colnames(results) <- 1:ncol(results)
results <- tbl_df(results) %>%
tidyr::gather(rep, sites) %>%
dplyr::group_by(rep) %>%
dplyr::mutate(isotypes = dplyr::row_number(rep)) %>%
dplyr::ungroup() %>%
dplyr::group_by(isotypes)
summarized_results <- results %>%
dplyr::group_by(isotypes) %>%
dplyr::summarize(mean_sites = mean(sites))
ggplot(results) +
geom_step(aes(y = sites, x = isotypes, color=rep), size = 0.5, alpha=0.5) +
geom_step(aes(y = mean_sites, x = isotypes), size=2, data=summarized_results) +
scale_y_continuous(sec.axis = sec_axis(~./nrow(df_use)*100)) +
theme(legend.position='none')
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment