ivagljiva/R_pom_workflow.md

## 50 changes: 50 additions & 0 deletions R_pom_workflow.md
@@ -0,0 +1,50 @@

    The following is a shortened version of the reproducible workflow for generating the *R. pomeroyi* databases. A longer workflow with additional detail is included in the datapack when downloading the databases from Zenodo.
The following is a shortened version of the reproducible workflow for generating the *R. pomeroyi* databases. A longer workflow with additional detail is included in the datapack when downloading the databases from Zenodo.

    (Zenodo link: [https://zenodo.org/record/7439166#.Y7tgZ-zMJQ0](https://zenodo.org/record/7439166))
(Zenodo link: [https://zenodo.org/record/7439166#.Y7tgZ-zMJQ0](https://zenodo.org/record/7439166))


    The databases were created using anvio `v7.1-dev`. First, the contigs database was generated from the genome sequence fasta file and a [file describing each gene call](https://anvio.org/help/main/artifacts/external-gene-calls/). Curated functional annotations from the Moran Lab (including the TnSeq mutants) were imported from tab-delimited [files describing each annotation](https://anvio.org/help/main/artifacts/functions-txt/), and automatic functional annotations were added by running various annotation programs.
The databases were created using anvio `v7.1-dev`. First, the contigs database was generated from the genome sequence fasta file and a [file describing each gene call](https://anvio.org/help/main/artifacts/external-gene-calls/). Curated functional annotations from the Moran Lab (including the TnSeq mutants) were imported from tab-delimited [files describing each annotation](https://anvio.org/help/main/artifacts/functions-txt/), and automatic functional annotations were added by running various annotation programs.


    ```
```

    # make the database
# make the database

    anvi-gen-contigs-database -f R-pom_genome.fasta --external-gene-calls DSS3_external_gene_calls_NEW.txt --skip-predict-frame -n R_POMEROYI_DSS3 -o R_POM_DSS3-contigs.db
anvi-gen-contigs-database -f R-pom_genome.fasta --external-gene-calls DSS3_external_gene_calls_NEW.txt --skip-predict-frame -n R_POMEROYI_DSS3 -o R_POM_DSS3-contigs.db


    # import curated annotations
# import curated annotations

    anvi-import-functions -c R_POM_DSS3-contigs.db -i SPO_external_functions.txt
anvi-import-functions -c R_POM_DSS3-contigs.db -i SPO_external_functions.txt

    anvi-import-functions -c R_POM_DSS3-contigs.db -i GENE_ID_external_functions.txt
anvi-import-functions -c R_POM_DSS3-contigs.db -i GENE_ID_external_functions.txt

    anvi-import-functions -c R_POM_DSS3-contigs.db -i LOCUS_external_functions.txt
anvi-import-functions -c R_POM_DSS3-contigs.db -i LOCUS_external_functions.txt


    # de novo annotation
# de novo annotation

    anvi-run-hmms -c R_POM_DSS3-contigs.db
anvi-run-hmms -c R_POM_DSS3-contigs.db

    anvi-run-scg-taxonomy -c R_POM_DSS3-contigs.db
anvi-run-scg-taxonomy -c R_POM_DSS3-contigs.db

    anvi-run-ncbi-cogs -c R_POM_DSS3-contigs.db -T 6
anvi-run-ncbi-cogs -c R_POM_DSS3-contigs.db -T 6

    anvi-run-kegg-kofams -c R_POM_DSS3-contigs.db -T 6
anvi-run-kegg-kofams -c R_POM_DSS3-contigs.db -T 6

    anvi-run-pfams -c R_POM_DSS3-contigs.db -T 6
anvi-run-pfams -c R_POM_DSS3-contigs.db -T 6


    # import TnSeq mutant annotations
# import TnSeq mutant annotations

    anvi-import-functions -c R_POM_DSS3-contigs.db -i TnSeq_annotations.txt
anvi-import-functions -c R_POM_DSS3-contigs.db -i TnSeq_annotations.txt

    ```
```


    After this, the (meta)transcriptome samples were downloaded and quality filtered with [FASTX-toolkit](http://hannonlab.cshl.edu/fastx_toolkit/commandline.html#fastq_quality_filter_usage), using the parameters from the [Landa et al paper](https://www.nature.com/articles/ismej2017117) methods section:
After this, the (meta)transcriptome samples were downloaded and quality filtered with [FASTX-toolkit](http://hannonlab.cshl.edu/fastx_toolkit/commandline.html#fastq_quality_filter_usage), using the parameters from the [Landa et al paper](https://www.nature.com/articles/ismej2017117) methods section:


    >Quality control was performed on 249 million 50-bp reads (10±2 million reads per sample; Supplementary Table S1) using the FASTX toolkit, imposing a minimum quality score of 20 over 80% of read length.
>Quality control was performed on 249 million 50-bp reads (10±2 million reads per sample; Supplementary Table S1) using the FASTX toolkit, imposing a minimum quality score of 20 over 80% of read length.


    Then the samples were mapped to the _R. pomeroyi_ DSS-3 genome using [bowtie2](https://github.com/BenLangmead/bowtie2) to obtain BAM files, which were sorted and indexed. Here is an example set of mapping commands for one sample:
Then the samples were mapped to the _R. pomeroyi_ DSS-3 genome using [bowtie2](https://github.com/BenLangmead/bowtie2) to obtain BAM files, which were sorted and indexed. Here is an example set of mapping commands for one sample:

    ```
```

    bowtie2 --threads 4 -x 04_MAPPING/R_POM_DSS3 -U $sample_file --no-unal -S 04_MAPPING/${name}.sam 2>&1 > 00_LOGS/${name}-bowtie.log
bowtie2 --threads 4 -x 04_MAPPING/R_POM_DSS3 -U $sample_file --no-unal -S 04_MAPPING/${name}.sam 2>&1 > 00_LOGS/${name}-bowtie.log


    samtools view -F 4 -bS 04_MAPPING/${name}.sam > 04_MAPPING/${name}-RAW.bam
samtools view -F 4 -bS 04_MAPPING/${name}.sam > 04_MAPPING/${name}-RAW.bam

    rm 04_MAPPING/${name}.sam
rm 04_MAPPING/${name}.sam


    anvi-init-bam 04_MAPPING/${name}-RAW.bam -o 04_MAPPING/${name}.bam
anvi-init-bam 04_MAPPING/${name}-RAW.bam -o 04_MAPPING/${name}.bam

    rm 04_MAPPING/${name}-RAW.bam
rm 04_MAPPING/${name}-RAW.bam

    ```
```


    Each BAM file was converted into a database containing one sample:
Each BAM file was converted into a database containing one sample:

    ```
```

    anvi-profile -c 03_CONTIGS/R_POM_DSS3-contigs.db -i 04_MAPPING/${name}.bam -o 05_PROFILE/${name} -S $sampname -T 4
anvi-profile -c 03_CONTIGS/R_POM_DSS3-contigs.db -i 04_MAPPING/${name}.bam -o 05_PROFILE/${name} -S $sampname -T 4

    ```
```


    Finally, all the individual profiles were merged into a single profile database.
Finally, all the individual profiles were merged into a single profile database.

    ```
```

    anvi-merge -c 03_CONTIGS/R_POM_DSS3-contigs.db -o 06_MERGED 05_PROFILE/*/PROFILE.db
anvi-merge -c 03_CONTIGS/R_POM_DSS3-contigs.db -o 06_MERGED 05_PROFILE/*/PROFILE.db

    ```
```