Skip to content

Instantly share code, notes, and snippets.

Created January 9, 2023 00:32
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
Star You must be signed in to star a gist
Save ivagljiva/d338f67361c4b45edde9aec1af663e78 to your computer and use it in GitHub Desktop.
Reproducible workflow for generating Ruegeria pomeroyi DSS-3 databases

The following is a shortened version of the reproducible workflow for generating the R. pomeroyi databases. A longer workflow with additional detail is included in the datapack when downloading the databases from Zenodo. (Zenodo link:

The databases were created using anvio v7.1-dev. First, the contigs database was generated from the genome sequence fasta file and a file describing each gene call. Curated functional annotations from the Moran Lab (including the TnSeq mutants) were imported from tab-delimited files describing each annotation, and automatic functional annotations were added by running various annotation programs.

# make the database
anvi-gen-contigs-database -f R-pom_genome.fasta --external-gene-calls DSS3_external_gene_calls_NEW.txt --skip-predict-frame -n R_POMEROYI_DSS3 -o R_POM_DSS3-contigs.db

# import curated annotations
anvi-import-functions -c R_POM_DSS3-contigs.db -i SPO_external_functions.txt
anvi-import-functions -c R_POM_DSS3-contigs.db -i GENE_ID_external_functions.txt
anvi-import-functions -c R_POM_DSS3-contigs.db -i LOCUS_external_functions.txt

# de novo annotation
anvi-run-hmms -c R_POM_DSS3-contigs.db
anvi-run-scg-taxonomy -c R_POM_DSS3-contigs.db
anvi-run-ncbi-cogs -c R_POM_DSS3-contigs.db -T 6
anvi-run-kegg-kofams -c R_POM_DSS3-contigs.db -T 6
anvi-run-pfams -c R_POM_DSS3-contigs.db -T 6

# import TnSeq mutant annotations
anvi-import-functions -c R_POM_DSS3-contigs.db -i TnSeq_annotations.txt

After this, the (meta)transcriptome samples were downloaded and quality filtered with FASTX-toolkit, using the parameters from the Landa et al paper methods section:

Quality control was performed on 249 million 50-bp reads (10±2 million reads per sample; Supplementary Table S1) using the FASTX toolkit, imposing a minimum quality score of 20 over 80% of read length.

Then the samples were mapped to the R. pomeroyi DSS-3 genome using bowtie2 to obtain BAM files, which were sorted and indexed. Here is an example set of mapping commands for one sample:

bowtie2 --threads 4 -x 04_MAPPING/R_POM_DSS3 -U $sample_file --no-unal -S 04_MAPPING/${name}.sam 2>&1 > 00_LOGS/${name}-bowtie.log

samtools view -F 4 -bS 04_MAPPING/${name}.sam > 04_MAPPING/${name}-RAW.bam
rm 04_MAPPING/${name}.sam

anvi-init-bam 04_MAPPING/${name}-RAW.bam -o 04_MAPPING/${name}.bam
rm 04_MAPPING/${name}-RAW.bam

Each BAM file was converted into a database containing one sample:

anvi-profile -c 03_CONTIGS/R_POM_DSS3-contigs.db -i 04_MAPPING/${name}.bam -o 05_PROFILE/${name} -S $sampname -T 4

Finally, all the individual profiles were merged into a single profile database.

anvi-merge -c 03_CONTIGS/R_POM_DSS3-contigs.db -o 06_MERGED 05_PROFILE/*/PROFILE.db
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment