Skip to content

Instantly share code, notes, and snippets.

@IsmailM
Last active August 29, 2015 14:27
Show Gist options
  • Save IsmailM/cc779132908fea48d471 to your computer and use it in GitHub Desktop.
Save IsmailM/cc779132908fea48d471 to your computer and use it in GitHub Desktop.
SPLIT MULTIPLE FASTA FILE into separate fasta files with id as filename
require 'bio' # you have to install bioruby - gem install bio
require 'fileutils'
# Run by running:
## ruby split_fasta.rb FASTA_FILE MIN MAX OUTPUT_DIR
fasta = ARGV[0]
min = ARGV[1]
max = ARGV[2]
output_dir = ARGV[3]
biofastafile = Bio::FlatFile.open(Bio::FastaFormat, fasta)
biofastafile.each_entry do |entry|
id = entry.definition.gsub(/[\/\\:*?"<>|]/, '_')
id = id[0..50]
out_file = File.join(output_dir, "#{id}.fa")
FileUtils.mkdir(output_dir) unless File.directory?(output_dir)
seq = entry.seq
next if seq.length > max.to_i || seq.length < min.to_i
File.open(out_file, 'w') { |f| f.puts entry.entry }
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment