Skip to content

Instantly share code, notes, and snippets.

@lwaldron
Created May 3, 2024 14:30
Show Gist options
  • Save lwaldron/35e2b6c1024b62f9466357229d98fc0d to your computer and use it in GitHub Desktop.
Save lwaldron/35e2b6c1024b62f9466357229d98fc0d to your computer and use it in GitHub Desktop.
curatedMetagenomicData healthy control samples, relab + metadata csv file per age category
library(curatedMetagenomicData)
library(dplyr)
agecats <- unique(sampleMetadata$age_category) |> na.omit()
sm <- filter(sampleMetadata, study_condition=="control") |>
filter(disease == "healthy") |>
filter(body_site == "stool") |>
filter(!is.na(age_category))
for (agecat in agecats){
sm1 <- filter(sm, age_category == agecat)
se <- returnSamples(sm1, dataType = "relative_abundance", rownames = "NCBI")
write.csv(t(assay(se)), file=paste0(agecat, "_relab.csv"))
write.csv(colData(se), file=paste0(agecat, "_samplemetadata.csv"))
}
@lwaldron
Copy link
Author

lwaldron commented May 3, 2024

Gist provides a relative abundance file with NCBI IDs in columns and observations in rows, and a corresponding metadata file for stool specimens from healthy control participants. I divided the files into age categories, since they'll have somewhat different properties:

$ wc -l *relab.csv
    8983 adult_relab.csv
     821 child_relab.csv
    2328 newborn_relab.csv
     229 schoolage_relab.csv
     835 senior_relab.csv
   13196 total

Note that the relative abundances won't always add up quite to 100% because some species that could not be mapped to the phylogeny were dropped, but these are rare and low abundance. ​Note also that there are an additional 1,301 control samples from body sites other than stool which are not included here, but available if you want them. And finally, we'll be re-running these and some (possibly tens of) thousands more specimens through MetaPhlAn4, which will add a large number of Species Genome Bins, putative species based on high-quality metagenome assemblies, that have not yet been isolated or named (or assigned NCBI identifiers).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment