Skip to content

Instantly share code, notes, and snippets.

@fstrozzi
Last active August 29, 2015 14:10
Show Gist options
  • Save fstrozzi/a6dd0daba816ea2fbd5d to your computer and use it in GitHub Desktop.
Save fstrozzi/a6dd0daba816ea2fbd5d to your computer and use it in GitHub Desktop.
Using HTSJDK from JRuby to parse a BAM file and filter for number of mismatches per read
#!/usr/bin/env ruby
require "htsjdk-1.125.jar" # https://github.com/broadinstitute/picard/releases/download/1.125/picard-tools-1.125.zip
require "logger"
java_import "htsjdk.samtools.SAMFileReader"
java_import "htsjdk.samtools.SAMFileWriterFactory"
logger = Logger.new(STDOUT)
if ARGV.size < 3
puts "USAGE: [BAM in] [% of mismatch] [BAM out]"
exit
end
bam_file_in = ARGV.shift
mismatch = ARGV.shift.to_f
bam_file_out = ARGV.shift
bam_in = SAMFileReader.new(java.io.File.new bam_file_in)
header = bam_in.getFileHeader
sam_write_factory = SAMFileWriterFactory.new
bam_out = sam_write_factory.makeBAMWriter(header,true,java.io.File.new(bam_file_out))
bam_in.each do |aln|
begin
unless aln.getReadUnmappedFlag
if (aln.getAttribute("NM").to_f / aln.getReadLength)*100 <= mismatch then
bam_out.addAlignment(aln)
end
end
rescue
logger.error("Processing read: #{aln.getReadName}")
end
end
bam_out.close
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment