MAG(Metagenome-Assembled Genome) や SAG(Single Amplified Genome) のクオリティチェックをバクテリアがもつべき遺伝子が揃っているかどうかでチェックするツール
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
H VN:Z:1.0 | |
S 5 TGC | |
S 7 CCCCCCCCCC | |
S 1 ATGTC | |
S 4 AGTCC | |
S 6 TTC | |
S 2 CA | |
S 3 AG | |
P 1 1+,2+,5+,3+,6+,4+ 5M,2M,3M,2M,3M,5M | |
L 1 + 2 + 0M |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/usr/bin/env python3 | |
# coding: utf-8 | |
""" | |
GFFを読み込んで、GFAの座標に合わせたCSVファイルに変換する | |
""" | |
import argparse | |
from queue import deque |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/usr/bin/env python3 | |
# coding: utf-8 | |
""" | |
Pair-wise alignment algorithm for comparison of two genome sequences | |
usage: python3 alignment.py | |
""" | |
from abc import ABCMeta, abstractmethod |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/bin/bash | |
# example: get_complete_genome.sh "Bifidobacterium longum" | |
wget ftp://ftp.ncbi.nlm.nih.gov/genomes/ASSEMBLY_REPORTS/assembly_summary_refseq.txt | |
for i in `grep $1 ../assembly_summary_refseq.txt|grep "Complete Genome"|awk -F "\t" '{print $(NF-2)}'` | |
do | |
a=`echo $i|awk -F "/" '{print $NF}'` | |
wget ${i}/${a}_genomic.fna.gz | |
done | |
date > download_log.txt |
もとはゲノムアセンブリで使われ出したタブ区切りのテキストファイル。
ノードが塩基配列、エッジがノードのリンクを表すように設計されている。1列目のタグで判定する。
ここでは、GFAのバージョン1.0で、かつvg view
が認識することができる表現についてのみ述べることにする。
Pの表現方法が、vg
のバージョンによって若干異なるのでそこも注意しなければならない(v1.5.0だと、GFA-spec 1.0 と同じように考えればよい。v1.6については未確認)。