Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save junaruga/5daad4a92305c896bc86cd7f4e74ab6d to your computer and use it in GitHub Desktop.
Save junaruga/5daad4a92305c896bc86cd7f4e74ab6d to your computer and use it in GitHub Desktop.
$ wc -l ilDeiPorc1.reads.*
244682 ilDeiPorc1.reads.fa
20045 ilDeiPorc1.reads.small.fa
264727 total
$ time docker run --rm -w /data/ -v /home/jaruga/tmp/mitohifi/exampleFiles/:/data/ -t docker.io/biocontainers/mitohifi:2.2_cv1 mitohifi.py -r /data/ilDeiPorc1.reads.small.fa -f /data/MW539688.1.fasta -g /data/MW539688.1.gb -t 4 -o 2 /data/ -v /home/jaruga/tmp/mitohifi/exampleFiles/:/data/ -t docker.io/biocontainers/mitohifi:2.2_cv1 mitohifi.py -r /data/ilDeiPorc1.reads.small.fa -f /data/MW539688.1.fasta -g /dat2022-10-25 16:33:56 [INFO] Welcome to MitoHifi v2. Starting pipeline...
2022-10-25 16:33:56 [INFO] Length of related mitogenome is: 15354 bp
2022-10-25 16:33:56 [INFO] Number of genes on related mitogenome: 37
2022-10-25 16:33:56 [INFO] Running MitoHifi pipeline in reads mode...
2022-10-25 16:33:56 [INFO] 1. First we map your Pacbio HiFi reads to the close-related mitogenome
2022-10-25 16:33:56 [INFO] minimap2 -t 4 --secondary=no -ax map-pb /data/MW539688.1.fasta /data/ilDeiPorc1.reads.small.fa | samtools view -@ 4 -S -b -F4 -F 0x800 > reads.HiFiMapped.bam
2022-10-25 16:33:56 [INFO] 2. Now we filter out any mapped reads that are larger than the reference mitogenome to avoid NUMTS
2022-10-25 16:33:56 [INFO] 2.1 First we convert the mapped reads from BAM to FASTA format:
2022-10-25 16:33:56 [INFO] samtools fasta reads.HiFiMapped.bam > gbk.HiFiMapped.bam.fasta
2022-10-25 16:33:56 [INFO] Total number of mapped reads: 137
2022-10-25 16:33:56 [INFO] 2.2 Then we filter reads that are larger than 15354 bp
2022-10-25 16:33:56 [INFO] Number of filtered reads: 137
2022-10-25 16:33:56 [INFO] 3. Now let's run hifiasm to assemble the mapped and filtered reads!
2022-10-25 16:33:56 [INFO] hifiasm --primary -t 4 -f 0 -o gbk.HiFiMapped.bam.filtered.assembled gbk.HiFiMapped.bam.filtered.fasta
2022-10-25 16:34:06 [INFO] 4. Let's run the blast of the contigs versus the close-related mitogenome
2022-10-25 16:34:06 [INFO] 4.1. Creating BLAST database:
2022-10-25 16:34:06 [INFO] makeblastdb -in /data/MW539688.1.fasta -dbtype nucl
2022-10-25 16:34:06 [INFO] Makeblastdb done.
2022-10-25 16:34:06 [INFO] 4.2. Running blast of contigs against close-related mitogenome:
2022-10-25 16:34:06 [INFO] blastn -query hifiasm.contigs.fasta -db /data/MW539688.1.fasta -num_threads 4 -out contigs.blastn -outfmt 6 std qlen slen
2022-10-25 16:34:06 [INFO] Blast done.
2022-10-25 16:34:06 [INFO] 5. Filtering BLAST output to select target sequences
2022-10-25 16:34:06 [INFO] Filtering thresholds applied:
2022-10-25 16:34:06 [INFO] Minimum query percentage = 50
2022-10-25 16:34:06 [INFO] Minimum query length = 80% subject length
2022-10-25 16:34:06 [INFO] Maximum query length = 5 times subject length
2022-10-25 16:34:06 [INFO] Filtering BLAST finished. A list of the filtered contigs was saved on ./contigs_filtering/contigs_ids.txt file
2022-10-25 16:34:06 [INFO] 6. Now we are going to circularize, annotate and rotate each filtered contig. Those are potential mitogenome(s).
2022-10-25 16:34:06 [INFO] Working with contig ptg000001l
2022-10-25 16:34:06 [INFO] Started ptg000001l circularization
2022-10-25 16:34:07 [INFO] ptg000001l circularization done. Circularization info saved on ./potential_contigs/ptg000001l/ptg000001l.circularisationCheck.txt
2022-10-25 16:34:07 [INFO] Started ptg000001l (MitoFinder) annotation
2022-10-25 16:36:52 [INFO] ptg000001l annotation done. Annotation log saved on ./potential_contigs/ptg000001l/ptg000001l.annotation_MitoFinder.log
2022-10-25 16:36:52 [INFO] Started ptg000001l rotation.
2022-10-25 16:36:52 [INFO] Rotation of ptg000001l done. Rotated is at ptg000001l.mitogenome.rotated.fa
Gene COX3 contains frameshift
Gene ATP6 contains frameshift
Gene COX2 contains frameshift
Gene COX1 contains frameshift
Gene ND2 contains frameshift
Gene ND1 contains frameshift
Gene CYTB contains frameshift
Gene ND4L contains frameshift
Gene ND4 contains frameshift
Gene ND5 contains frameshift
Gene ND3 contains frameshift
2022-10-25 16:36:52 [INFO] 7. Now the rotated contigs will be aligned
2022-10-25 16:36:52 [INFO] List of contigs that will be aligned: ['ptg000001l.mitogenome.rotated.fa']
2022-10-25 16:36:52 [INFO] MAFFT alignment will be called with:
mafft --quiet --clustalout --thread 4 all_mitogenomes.rotated.fa > all_mitogenomes.rotated.aligned.fa
2022-10-25 16:36:52 [INFO] Alignment done and saved at ./final_mitogenome_choice/all_mitogenomes.rotated.aligned.fa
2022-10-25 16:36:52 [INFO] 8. Now we will choose the most representative contig
/bin/MitoHiFi/getReprContig.py:96: UserWarning: Warning: representative contig contains frameshifts
warnings.warn("Warning: representative contig contains frameshifts")
2022-10-25 16:36:52 [INFO] Representative contig is ptg000001l that belongs to Cluster 0. This contig will be our final mitogenome. See all contigs and clusters in cdhit.out.clstr
2022-10-25 16:39:24 [INFO] 9. Calculating final stats for final mitogenome and other potential contigs.
Stats will be saved on contigs_stats.tsv file.
Gene ND3 contains frameshift
Gene COX3 contains frameshift
Gene ATP6 contains frameshift
Gene COX2 contains frameshift
Gene COX1 contains frameshift
Gene ND2 contains frameshift
Gene ND1 contains frameshift
Gene CYTB contains frameshift
Gene ND4L contains frameshift
Gene ND4 contains frameshift
Gene ND5 contains frameshift
2022-10-25 16:39:24 [INFO] Pipeline finished!
2022-10-25 16:39:24 [INFO] Run time: 328.83 seconds
real 5m31.986s
user 0m0.048s
sys 0m0.032s
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment