Skip to content

Instantly share code, notes, and snippets.

@ivagljiva
Created January 9, 2023 00:32

Revisions

  1. ivagljiva created this gist Jan 9, 2023.
    50 changes: 50 additions & 0 deletions R_pom_workflow.md
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,50 @@
    The following is a shortened version of the reproducible workflow for generating the *R. pomeroyi* databases. A longer workflow with additional detail is included in the datapack when downloading the databases from Zenodo.
    (Zenodo link: [https://zenodo.org/record/7439166#.Y7tgZ-zMJQ0](https://zenodo.org/record/7439166))

    The databases were created using anvio `v7.1-dev`. First, the contigs database was generated from the genome sequence fasta file and a [file describing each gene call](https://anvio.org/help/main/artifacts/external-gene-calls/). Curated functional annotations from the Moran Lab (including the TnSeq mutants) were imported from tab-delimited [files describing each annotation](https://anvio.org/help/main/artifacts/functions-txt/), and automatic functional annotations were added by running various annotation programs.

    ```
    # make the database
    anvi-gen-contigs-database -f R-pom_genome.fasta --external-gene-calls DSS3_external_gene_calls_NEW.txt --skip-predict-frame -n R_POMEROYI_DSS3 -o R_POM_DSS3-contigs.db
    # import curated annotations
    anvi-import-functions -c R_POM_DSS3-contigs.db -i SPO_external_functions.txt
    anvi-import-functions -c R_POM_DSS3-contigs.db -i GENE_ID_external_functions.txt
    anvi-import-functions -c R_POM_DSS3-contigs.db -i LOCUS_external_functions.txt
    # de novo annotation
    anvi-run-hmms -c R_POM_DSS3-contigs.db
    anvi-run-scg-taxonomy -c R_POM_DSS3-contigs.db
    anvi-run-ncbi-cogs -c R_POM_DSS3-contigs.db -T 6
    anvi-run-kegg-kofams -c R_POM_DSS3-contigs.db -T 6
    anvi-run-pfams -c R_POM_DSS3-contigs.db -T 6
    # import TnSeq mutant annotations
    anvi-import-functions -c R_POM_DSS3-contigs.db -i TnSeq_annotations.txt
    ```

    After this, the (meta)transcriptome samples were downloaded and quality filtered with [FASTX-toolkit](http://hannonlab.cshl.edu/fastx_toolkit/commandline.html#fastq_quality_filter_usage), using the parameters from the [Landa et al paper](https://www.nature.com/articles/ismej2017117) methods section:

    >Quality control was performed on 249 million 50-bp reads (10±2 million reads per sample; Supplementary Table S1) using the FASTX toolkit, imposing a minimum quality score of 20 over 80% of read length.
    Then the samples were mapped to the _R. pomeroyi_ DSS-3 genome using [bowtie2](https://github.com/BenLangmead/bowtie2) to obtain BAM files, which were sorted and indexed. Here is an example set of mapping commands for one sample:
    ```
    bowtie2 --threads 4 -x 04_MAPPING/R_POM_DSS3 -U $sample_file --no-unal -S 04_MAPPING/${name}.sam 2>&1 > 00_LOGS/${name}-bowtie.log
    samtools view -F 4 -bS 04_MAPPING/${name}.sam > 04_MAPPING/${name}-RAW.bam
    rm 04_MAPPING/${name}.sam
    anvi-init-bam 04_MAPPING/${name}-RAW.bam -o 04_MAPPING/${name}.bam
    rm 04_MAPPING/${name}-RAW.bam
    ```

    Each BAM file was converted into a database containing one sample:
    ```
    anvi-profile -c 03_CONTIGS/R_POM_DSS3-contigs.db -i 04_MAPPING/${name}.bam -o 05_PROFILE/${name} -S $sampname -T 4
    ```

    Finally, all the individual profiles were merged into a single profile database.
    ```
    anvi-merge -c 03_CONTIGS/R_POM_DSS3-contigs.db -o 06_MERGED 05_PROFILE/*/PROFILE.db
    ```