Skip to content

Instantly share code, notes, and snippets.

View lwaldron's full-sized avatar

Levi Waldron lwaldron

View GitHub Profile
@lwaldron
lwaldron / anscombe_residuals.Rmd
Created June 20, 2022 12:28
Residuals plots of the Anscombe datasets
---
title: "Anscombe residuals plots"
author: "Levi Waldron"
date: "`r Sys.Date()`"
output: html_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
@lwaldron
lwaldron / microsud_cmd_reprex.R
Last active June 11, 2022 10:45
Comparing assays from two different downloads
# installed curatedMetagenomicData from github
suppressPackageStartupMessages({
library(curatedMetagenomicData)
library(dplyr)
})
#I download the data in two ways, one select a few CRC studies and the other will multiple studies available.
# specific only crc studies downloaded
suppressMessages({
@lwaldron
lwaldron / diabimmune.R
Created February 2, 2022 16:54
Download DIABIMMUNE antibiotics cohort .fna.gz files
# See https://diabimmune.broadinstitute.org/diabimmune/antibiotics-cohort/resources/16s-sequence-data
# The provided command `wget -r -np -nd https://pubs.broadinstitute.org/diabimmune/data/15` does not work because files are listed in an html page
library(dplyr)
library(rvest)
url <- "https://diabimmune.broadinstitute.org/diabimmune/data/15/"
url %>%
read_html() %>%
html_elements("a") %>%
html_attr("href") %>%
download.file(., destfile = basename(.))
@lwaldron
lwaldron / NYC-COVID_ACS_merge
Created September 13, 2021 01:57
NYC-COVID data merged with ACS community-level data
##### Importing COVID-19 data from the NYC DOHMH github (https://github.com/nychealth/coronavirus-data) and merge with ACS data of interest
# In order to get the URL of a table of your interest, go to the table and click on 'History' on the top right corner.
# You will see the upload history for the table on this page. Choose a time point of interest and click on the second
# to the last button on the right (if you hover over the button it should say 'View at this point in the hisotry').
# You will be directed to view the table. Then click on 'Raw' and copy the URL.
covid <- read.csv("https://raw.githubusercontent.com/nychealth/coronavirus-data/7ce1b84610232be9c3f780484865a51f73b8c469/recent/recent-4-week-by-modzcta.csv")
head(covid)
@lwaldron
lwaldron / framingham.R
Created September 13, 2021 01:50
Framingham Heart Study access and recoding
##### Importing Framingham Heart Study data from a github repository (https://github.com/GauravPadawe/Framingham-Heart-Study)
library(tidyverse)
#importing the dataset
chddata <- read.csv("https://raw.githubusercontent.com/GauravPadawe/Framingham-Heart-Study/adcc828b8a5b3ddbd8d5b8b98e2b27cf60538db6/framingham.csv")
#some recoding
chddataclean <- chddata %>%
mutate(TenYearCHD = if_else (TenYearCHD=='1',"CHD", "No-CHD"),
@lwaldron
lwaldron / scMultiome.R
Created June 15, 2021 12:16
Object serialization and sizes of SingleCellMultiModal::scMultiome dataset
library(SingleCellMultiModal)
library(MultiAssayExperiment)
suppressMessages(scmm <- scMultiome(dry.run = FALSE))
format(object.size(scmm), units="Mb") #31Mb in memory
saveHDF5MultiAssayExperiment(scmm)
dir("h5_mae", full.names=TRUE) |> file.info() # ~193MB on disk
suppressMessages(scmm_sparse <- scMultiome(format = "MTX", dry.run = FALSE))
@lwaldron
lwaldron / TCGA_re.R
Created June 1, 2021 06:48
CPU and memory footprints of a few operations on RaggedExperiment objects from TCGA
## ---------------------------------------------------------------------------------------------------------------------------------
library(curatedTCGAData)
library(TCGAutils)
library(RaggedExperiment)
## -----------------------------------------------------------------------------------------------------------------------------------------
cnvdry <-
curatedTCGAData(assays = "CNVSNP",
# I create and discuss this code at https://youtu.be/nU_GEPKVXU8
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
system("gsutil cp gs://biocbbs_2020a/cmd3out/uploads/GuptaA_2019.metaphlan_bugs_list.stool.rda .")
load("GuptaA_2019.metaphlan_bugs_list.stool.rda")
head(rownames(GuptaA_2019.metaphlan_bugs_list.stool)) #first 3 look wrong
grep("CIBIO", rownames(GuptaA_2019.metaphlan_bugs_list.stool)) #there are 60 with CIBIO in the rowname
@lwaldron
lwaldron / cBioPortal tests
Last active September 1, 2020 22:16
Download of full ACC and BRCA datasets, GBM IMPACT341
# to run this using Docker from the command line on the stock Bioconductor image:
# docker run -it bioconductor/bioconductor_docker:latest R
BiocManager::install("cBioPortalData")
library(cBioPortalData)
#acc_tcga full data pack
system.time(accpack <- cBioDataPack("acc_tcga")) #~10 seconds
accpack