Created
February 19, 2013 16:56
-
-
Save peterjc/4987706 to your computer and use it in GitHub Desktop.
Quick Python script to extract contig summary information (lengths and number of reads, as a tabular file) from an ACE assembly file, using the Biopython ACE parser for convenience.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/usr/bin/env python | |
#Example usage: | |
# | |
# $ python ace_to_contig_stats.py < example.ace > example_stats.tsv | |
# | |
import sys | |
from Bio.Sequencing import Ace | |
sys.stdout.write("#Contig\tPadded length\tUnpadded length\tReads\n") | |
for contig in Ace.parse(sys.stdin): | |
seq = contig.sequence | |
sys.stdout.write("%s\t%i\t%i\t%i\n" % (contig.name, | |
len(seq), | |
len(seq) - seq.count("*"), | |
contig.nreads)) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment