Skip to content

Instantly share code, notes, and snippets.

@maasha
Created September 19, 2013 07:18
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save maasha/6620031 to your computer and use it in GitHub Desktop.
Save maasha/6620031 to your computer and use it in GitHub Desktop.
Ruby script for the reconstruction of deteriorated index in paired end Illumina reads. maasha/seq and maasha/fastq are from www.biopieces.org
#!/usr/bin/env ruby
require 'pp'
require 'maasha/fastq'
require 'maasha/seq'
map_file = "/home/maasha/Data/KimMagnussen/20130904_hiseq2a_rescue/sample_table.txt"
barcode_hash = {}
barcode1_hash = {}
barcode2_hash = {}
File.open(map_file) do |ios|
ios.each do |line|
next if line[0] == '#'
fields = line.chomp.split("\t")
sample_name = fields[1]
barcode = fields[5]
barcode1, barcode2 = barcode.split('_')
barcode_hash[barcode1 + barcode2] = sample_name
barcode1_hash[barcode1] = sample_name
barcode2_hash[barcode2] = sample_name
end
end
files = Dir.glob("/home/maasha/data/KimMagnussen/20130904_hiseq2a_rescue/20130904_hiseq2a/Undetermined_indices/**/*fastq.gz")
fh_hash = {}
files.each do |file|
STDERR.puts "Processing file: #{File.basename(file)}"
lane_info = File.basename(file, ".fastq.gz").split('_')[2 .. -2].join('_')
Fastq.open(file) do |ios|
ios.each do |entry|
barcode = entry.seq_name.split(':').last
barcode1 = barcode[0 .. 7]
barcode2 = barcode[8 .. 15]
if barcode1_hash[barcode1]
if barcode2_hash[barcode2]
# STDERR.puts "OK: barcode2: #{barcode2} found in #{entry.seq_name}"
else
barcode2_regex = /^#{barcode2.tr('N', '.')}$/
hits_hash = {}
barcode2_hash.each_key do |barcode2|
if barcode2.match barcode2_regex
hits_hash["#{barcode1}_#{barcode2}"] = barcode_hash[barcode1 + barcode2]
end
end
if hits_hash.size == 0
# STDERR.puts "Warning: empty hits hash"
elsif hits_hash.size > 1
# STDERR.puts "Warning: multiple hits: #{hits_hash.size}"
else
sample = hits_hash.first.last
if sample
outfile = "#{sample}_#{lane_info}.fastq"
unless fh_hash[outfile]
fh = File.open(outfile, 'w')
fh_hash[outfile] = fh
end
fh = fh_hash[outfile]
fh.puts entry.to_fastq
else
# STDERR.puts "Warning: no sample for barcode: #{hits_hash.first}"
end
end
end
else
# STDERR.puts "Warning: barcode1: #{barcode1} not in #{entry.seq_name}"
end
end
end
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment