Created
January 9, 2023 00:32
Revisions
-
ivagljiva created this gist
Jan 9, 2023 .There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -0,0 +1,50 @@ The following is a shortened version of the reproducible workflow for generating the *R. pomeroyi* databases. A longer workflow with additional detail is included in the datapack when downloading the databases from Zenodo. (Zenodo link: [https://zenodo.org/record/7439166#.Y7tgZ-zMJQ0](https://zenodo.org/record/7439166)) The databases were created using anvio `v7.1-dev`. First, the contigs database was generated from the genome sequence fasta file and a [file describing each gene call](https://anvio.org/help/main/artifacts/external-gene-calls/). Curated functional annotations from the Moran Lab (including the TnSeq mutants) were imported from tab-delimited [files describing each annotation](https://anvio.org/help/main/artifacts/functions-txt/), and automatic functional annotations were added by running various annotation programs. ``` # make the database anvi-gen-contigs-database -f R-pom_genome.fasta --external-gene-calls DSS3_external_gene_calls_NEW.txt --skip-predict-frame -n R_POMEROYI_DSS3 -o R_POM_DSS3-contigs.db # import curated annotations anvi-import-functions -c R_POM_DSS3-contigs.db -i SPO_external_functions.txt anvi-import-functions -c R_POM_DSS3-contigs.db -i GENE_ID_external_functions.txt anvi-import-functions -c R_POM_DSS3-contigs.db -i LOCUS_external_functions.txt # de novo annotation anvi-run-hmms -c R_POM_DSS3-contigs.db anvi-run-scg-taxonomy -c R_POM_DSS3-contigs.db anvi-run-ncbi-cogs -c R_POM_DSS3-contigs.db -T 6 anvi-run-kegg-kofams -c R_POM_DSS3-contigs.db -T 6 anvi-run-pfams -c R_POM_DSS3-contigs.db -T 6 # import TnSeq mutant annotations anvi-import-functions -c R_POM_DSS3-contigs.db -i TnSeq_annotations.txt ``` After this, the (meta)transcriptome samples were downloaded and quality filtered with [FASTX-toolkit](http://hannonlab.cshl.edu/fastx_toolkit/commandline.html#fastq_quality_filter_usage), using the parameters from the [Landa et al paper](https://www.nature.com/articles/ismej2017117) methods section: >Quality control was performed on 249 million 50-bp reads (10±2 million reads per sample; Supplementary Table S1) using the FASTX toolkit, imposing a minimum quality score of 20 over 80% of read length. Then the samples were mapped to the _R. pomeroyi_ DSS-3 genome using [bowtie2](https://github.com/BenLangmead/bowtie2) to obtain BAM files, which were sorted and indexed. Here is an example set of mapping commands for one sample: ``` bowtie2 --threads 4 -x 04_MAPPING/R_POM_DSS3 -U $sample_file --no-unal -S 04_MAPPING/${name}.sam 2>&1 > 00_LOGS/${name}-bowtie.log samtools view -F 4 -bS 04_MAPPING/${name}.sam > 04_MAPPING/${name}-RAW.bam rm 04_MAPPING/${name}.sam anvi-init-bam 04_MAPPING/${name}-RAW.bam -o 04_MAPPING/${name}.bam rm 04_MAPPING/${name}-RAW.bam ``` Each BAM file was converted into a database containing one sample: ``` anvi-profile -c 03_CONTIGS/R_POM_DSS3-contigs.db -i 04_MAPPING/${name}.bam -o 05_PROFILE/${name} -S $sampname -T 4 ``` Finally, all the individual profiles were merged into a single profile database. ``` anvi-merge -c 03_CONTIGS/R_POM_DSS3-contigs.db -o 06_MERGED 05_PROFILE/*/PROFILE.db ```