Created
October 8, 2018 17:58
-
-
Save ShaiberAlon/cc140d413339926c7b12fd7043e335d2 to your computer and use it in GitHub Desktop.
Split fasta into multiple fasta files with a max size
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/usr/bin/env python | |
''' | |
split fasta file into multiple smaller fasta files | |
Use like this: | |
python SPLIT-FASTA.py fasta-name.fa output-prefix SIZE | |
So if your input fasta was contigs.fa, and had 190 sequences then: | |
python SPLIT-FASTA.py contigs.fa contigs-mini 50 | |
would result in four output files: | |
contigs-mini_0.fa | |
contigs-mini_1.fa | |
contigs-mini_2.fa | |
contigs-mini_3.fa | |
''' | |
import sys | |
import anvio.fastalib as f | |
input_file_name = sys.argv[1] | |
output_file_prefix = sys.argv[2] | |
max_number_per_file = int(sys.argv[3]) | |
c = f.ReadFasta(f_name= input_file_name) | |
n = 0 | |
m = 0 | |
output_file_name = '%s_%s.fa' % (output_file_prefix, m) | |
output_fasta = f.FastaOutput(output_file_path=output_file_name) | |
for header,seq in zip(c.ids, c.sequences): | |
if n == max_number_per_file: | |
n = 0 | |
m += 1 | |
output_fasta.close() | |
output_file_name = '%s_%s.fa' % (output_file_prefix, m) | |
output_fasta = f.FastaOutput(output_file_path=output_file_name) | |
n += 1 | |
output_fasta.write_id(header) | |
output_fasta.write_seq(seq) | |
output_fasta.close() |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment